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Abstract 

Resilient  Control  and  Intrusion  Detection  for  SCADA  Systems 

by 

Xia  Bonnie  Zhu 

Doctor  of  Philosophy  in  Engineering  -  Electrical  Engineering  and  Computer  Science 

University  of  California,  Berkeley 
Professor  S.  Shankar  Sastry,  Chair 


Supervisory  Control  and  Data  Acquisition  (SCADA)  systems  are  deeply  ingrained  in  the  fabric 
of  critical  infrastructure  sectors.  These  computerized  real-time  process  control  systems,  over  geo¬ 
graphically  dispersed  continuous  distribution  operations,  are  increasingly  subject  to  serious  dam¬ 
age  and  disruption  by  cyber  means  due  to  their  standardization  and  connectivity  to  other  networks. 
However,  SCADA  systems  generally  have  little  protection  from  the  escalating  cyber  threats.  To 
achieve  defense-in-depth  for  SCADA  systems  by  means  of  intrusion  detection  and  resilient  con¬ 
trol,  this  dissertation  strives  for  a  robust  stochastic  signal  and  system  approach  without  being 
overly-pessimistic.  Its  main  elements  are  (1)  two  SCADA-specific  comprehensive  taxonomies 
with  one  on  cyber  attacks  and  the  other  on  intrusion  detection  system  to  layout  the  lay  of  the  land 
and  shed  light  to  the  workspace,  (2)  one  overall  framework/architecture  for  intrusion  detection  and 
resilient  control  -  Xware  (3)  its  measurement  fusion  assurance  component  -  Trust  counter ,  (4)  one 
signal-based  early-detection  and  resilient  estimation  scheme  with  proved  theoretical  performance 
bounds,  for  SCADA  systems  in  general.  Especially  the  said  Robust  General  Likelihood  Ratio  Test 
(RGLRT)  is  generic  enough  and  has  been  applied  to  linear  dynamical  systems  in  general  and  be¬ 
yond.  (5)  The  application  of  RGLRT  in  network  traffic  anomaly  detection.  (6)  The  application  of 
RGLRT  to  anomaly  detection  for  SCADA  systems  in  smart  grids  through  model  construction  and 
identification  for  both  clean  renewable  energy  supply  and  variable  consumer  demand. 

First,  in  order  to  understand  the  potential  danger  and  to  protect  SCADA  systems,  we  highlight 
their  difference  from  standard  Information  Technology  (IT)  systems  and  present  a  set  of  security 
property  goals.  Furthermore,  we  systematically  identify  and  classify  likely  cyber  attacks  including 
cyber-induced  cyber-physical  attacks  on  SCADA  systems  are  according  the  SCADA’s  hierarchy. 
Determined  by  the  impact  on  control  performance  of  SCADA  systems,  we  use  the  attack  cate¬ 
gorization  criteria  to  stress  the  commonalities  and  important  features  of  such  attacks  that  define 
unique  challenges  posed  to  securing  SCADA  systems  versus  traditional  IT  systems. 

Second,  in  order  to  address  the  big  challenge  of  how  to  modify  conventional  IT  intrusion 
detection  techniques  to  suit  the  needs  of  SCADA,  we  explain  the  nuance  associated  with  the  task 
of  SCADA-specific  intrusion  detection  and  frame  it  in  the  domain  interest  of  control’s  researchers 
to  illuminate  problem  space.  We  present  a  taxonomy  and  a  set  of  metrics  for  SCADA-specific 
intrusion  detection  techniques  through  heightening  their  possible  use  in  SCADA  systems.  In 
particular,  we  enumerate  a  list  of  Intrusion  Detection  Systems  (IDS)  that  have  been  proposed  to 
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undertake  this  endeavor.  Drawing  upon  the  discussion,  we  identify  the  deficits  and  voids  in  current 
research.  Based  upon  this  taxonomy  and  analysis  on  which  SCADA-specific  IDS  strategies  are 
most  likely  to  succeed,  we  offer  recommendations  and  future  research  venues  in  part  through 
presenting  a  prototype  of  such  efforts  towards  this  goal. 

Third,  we  present  the  overall  architecture  for  instruction  detection  and  resilient  control  Xware. 

It  is  comprised  of  two  strong  footings  -  Normalcy  Checking ,  a  control  theoretic,  domain  knowl¬ 
edge  specific,  specification-based  payload  inspection  system  and  a  high-speed,  real-time,  behavioral- 
based  Network  Intrusion  Detection  System  (NIDS).  Xware  integrates  a  Trust  Counter  to  verify  the 
truthfulness  of  sensor  measurements.  It  also  provides  exfiltration  of  confidential  information  from 
within  the  intranet.  Moreover,  Xware  hardens  SCADA  system  with  compensation  schemes  when 
intrusion  evades  NIDS  or  unexpected  fault  occurs  to  guarantee  its  performance.  It  puts  things  in 
perceptive  and  highlights  the  overall  systematic  and  holistic  approach. 

Fourth,  we  propose  the  Trust  Counter  to  deal  the  cases  when  the  possible  manifestation  of 
those  potential  disruption  from  cyber  attacks  can  affect  the  Kalman  filter,  the  primary  recursive 
estimation  method  used  in  the  control  engineering  field.  Whereas,  to  improve  such  estimation, 
data  fusion  may  take  place  at  a  central  location  to  fuse  and  process  multiple  sensor  measurements 
delivered  over  the  network.  In  an  uncertain  networked  control  system  where  the  nodes  and  links 
are  subject  to  attacks,  false  or  compromised  or  missing  individual  readings  can  produce  skewed 
results.  To  assure  the  validity  of  data  fusion,  a  centralized  trust  rating  system  is  proposed.  It  eval¬ 
uates  the  trustworthiness  of  each  sensor  reading  on  top  of  the  fusion  mechanism.  The  ratings  are 
represented  by  Beta  distribution,  the  conjugate  prior  of  the  binomial  distribution  and  its  posterior. 
Then  an  illustrative  example  demonstrates  its  efficiency. 

Fifth,  RGLRT  is  an  earlier  anomaly  detection  and  resilient  estimation  scheme  for  the  cyber¬ 
physical  systems,  networked  control  systems  to  be  specific,  in  an  uncertain  network  environment. 

It  robustly  identifies  and  detects  outliers  among  real-time  multidimensional  measurements  of  dy¬ 
namical  systems  by  using  an  online  window-limited  sequential  Robust  Generalized  Likelihood 
Ratio  (RGLR)  test  without  any  prior  knowledge  of  the  occurrence  time  and  distribution  of  the 
outliers.  The  robust  sequential  testing  and  quick  detection  scheme  achieves  the  optimal  stopping 
time  with  low  rates  in  both  false  alarm  and  misdetection.  We  propose  a  set  of  qualitative  and 
quantitative  metric  to  measure  its  optimality  in  the  context  of  cyber-physical  systems.  Further, 
this  resilient  and  flexible  estimation  scheme  robustly  rectifies  and  cleans  data  upon  both  isolated 
and  patchy  outliers  while  maintain  the  optimality  of  the  Kalman  Filter  under  the  nominal  condi¬ 
tion.  Its  approximated  optimality  of  the  robustification  performance  is  shown  through  stochastic 
approximation. 

Sixth,  we  give  a  network  anomaly  detection  scheme  as  one  of  the  applications  of  RGLRT. 
The  time  series  model  of  Autoregressive  Integrated  Moving  Average  (ARIMA)  progress,  finds  its 
wide  usage  including  network  security  applications.  Model  building  and  anomaly  detection  based 
on  such  models  are  often  a  first  and  important  step  towards  monitoring  unexpected  problems  and 
assuring  the  soundness  and  security  of  those  systems  being  studied.  The  time  variability  by  the 
coefficients  in  those  dynamic  regression  models  is  particularly  relevant  and  possibly  indicative.  To 
address  this  issue,  a  corresponding  framework  and  a  novel  anomaly  detection  approach  based  on 
the  Kalman  filter  for  identifying  those  dynamic  models  including  their  parameters  and  a  General 
Likelihood  Ratio  (GLR)  test  for  detecting  suspicious  changes  in  the  parameters  and  therefore  the 
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models  is  proposed.  The  idea  is  shown  through  experiments  and  show  its  promising  potential  in 
terms  of  accuracy  and  robustness. 

Seventh,  we  apply  RGLRT  to  anomaly  detection  for  SCADA  systems  in  smart  grids.  While 
the  utilization  of  clean  energy  resources  including  wind  and  solar  power  sets  to  grow  from  filling 
the  gap  of  peak  hours  to  taking  a  larger  share  in  the  upcoming  smart  grid  and  efficient  infrastruc¬ 
ture,  the  price-incentivized  electricity  consumption  shall  alleviate  peak  hours  and  reduce  power 
outages.  Both  benign  faults  and  malicious  attacks  threat  the  reliability  and  availability  of  the  new 
grid.  We  address  these  duo  problems  are  from  the  angle  of  one  fundamental  technique  used.  The 
ARIMA  time  series  models  play  roles  at  both  ends  in  this  new  ecosystem:  namely,  predicting 
the  variable  clean  energy  resource  on  the  supply  side  and  forecasting  the  flexible  load  demand  on 
the  consume  side.  Model  construction  and  anomaly  detection  based  on  such  models  are  often  a 
first  and  important  step  towards  monitoring  unexpected  problems  and  assuring  the  soundness  and 
security  of  those  systems  being  studied.  The  time  variability  of  the  coefficients  in  those  dynamic 
regression  models  is  particularly  relevant  and  possibly  indicative.  Thus  a  corresponding  frame¬ 
work  and  a  novel  anomaly  detection  approach  is  introduced.  It’s  based  on  a  robustified  Kalman 
Filter  for  identifying  those  dynamic  models  including  their  parameters  and  a  RGLRT  for  detecting 
suspicious  changes  in  the  parameters  and  therefore  the  models.  Currently,  the  effectiveness  and 
robustness  of  this  method  is  shown  through  simulation. 
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Chapter  1 
Introduction 


Due  to  their  standardization  and  connectivity  to  other  networks,  Supervisory  Control  and  Data 
Acquisition  (SCADA)  systems  are  increasingly  subject  to  damage  and  disruption  by  cyber  means. 
However,  the  issues  facing  securing  SCADA  system  are:  (1)  regulation- wise:  Lack  of  policies  or 
standards,  (2)  technology- wise:  the  need  for  availability ,  integrity,  confidentiality  is  only  met  with 
limited  specialized  solutions,  (3)  economics-  and  finance- wise:  lack  of  economic  justification,  (4) 
markets-wise:  they  are  legacy  systems,  where  lack  of  demands  from  operators:  organizational 
priorities  conflict. 

In  particular,  SCADA  present  challenges  for  security  engineering  due  to  their  requirements 
for  continuous  availability,  real-time  operation,  potential  impact  on  the  populace  and  the  physical 
world,  and  legacy  deployments.  They  further  play  crucial  roles  in  the  fabric  of  critical  infrastruc¬ 
ture  such  as  electric  power  grids,  water  distribution  systems,  petroleum  and  natural  gas  pipelines, 
and  manufacturing  operations. 

The  cyber-physical  security  of  real-time,  continuous  systems  necessitates  a  comprehensive 
view  and  holistic  understanding  of  network  security,  control  theory  and  the  physical  system.  Ulti¬ 
mately,  any  viable  technical  solutions  and  research  directions  in  securing  SCADA  systems  must  lie 
in  the  conjunction  of  computer  security,  communication  network  and  control  engineering.  How¬ 
ever,  the  very  large  installed  base  of  such  systems  means  that  in  many  instances  we  must  for  a 
long  time  to  come  rely  on  retrofitted  security  mechanisms,  rather  than  having  the  option  to  design 
them  in  from  scratch.  This  leads  to  a  pressing  need  for  robust  SCADA-specific  intrusion  detection 
systems  (IDS)  and  resilient  control. 

The  goals  of  this  effort  are  to  develop  IDS  and  resilient  control  technology  that  can  (1)  ef¬ 
ficiently  detect  and  block  cyber  intrusions  into  SCADA  systems  in  entrenched  operational  envi¬ 
ronments,  in  real-time,  (2)  without  interrupting  the  control  performance  of  the  protected  system, 
(3)  without  creating  extra  operational  burden  or  operational  reservations  due  to  false  alarms,  (4) 
in  the  presence  of  both  malicious  and  messily  benign  network  traffic,  (5)  and  lastly  rectify  and 
compensate  the  system  performance  in  case  some  intrusions  succeed.  The  system  must  operate 
in  a  real-time,  robust  fashion,  with  performance  adequate  to  meet  the  demands  of  the  dynamic 
cyber-physical  interactions  inherent  to  SCADA  systems. 

To  this  end,  we  formulate  a  number  of  objectives, 


•  Conceptualize  control  performance  -  oriented  metrics  for  mentioned  security  measures, 
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•  Develop  usage-  and  goal-oriented  taxonomies  of  cyber  attacks  on  SCADA  system  and 
SCADA-specific  IDS  to  shed  insight  onto  the  problem  domain. 

•  Establish  prudent  and  plausible  threat  models, 

•  Characterize  the  system  architecture,  protocol  use,  network  topology,  and  network  activity 
of  SCADA  systems  used  in  power  grid,  particularly. 

•  Create  models  of  both  normal  operation  and  the  allowed  range  of  operation  (ala’  specification- 
based  intrusion  detection)  to  enable  detection  of  new  attacks  while  maintaining  low  false 
alarm  rates  during  legitimate  changes  of  a  SCADA  system’s  dynamics  and  permitted  vari¬ 
ations  in  its  traffic,  including  valid  safety  system  responses  at  extreme  cases.  Unique  to 
this  problem  domain,  such  models  can  draw  upon  insight  into  expected  and  allowed  be¬ 
havior  that  we  can  “analytically”  derive  form  the  underlying  control  system  principles  and 
properties. 

•  Find  asymptotic  performance  bounds  on  these  models. 

•  Integrate  a  network  IDS  with  these  models  to  enable  a  resilient,  defense-in-depth,  SCADA- 
domain  network  monitoring,  and  online  data  clearing  &  control  compensation  in  case  cer¬ 
tain  intrusions  succeed. 

•  Construct  a  test  environment  to  verify  the  IDS  performance  in  terms  of  its  resistance  to  eva¬ 
sion  and  ability  to  detect  and  block  attacks  against  a  given  SCADA  system  with  acceptable 
low  false  alarm  rate. 

•  Conduct  experiments  to  confirm  the  system’s  resilience  level  in  case  certain  attacks  succeed. 
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Chapter  2 

A  Taxonomy  of  Cyber  Attacks  on  SCADA 
Systems 


Example  is  the  school  of  mankind,  and 
they  will  learn  at  no  other. 

Letters  on  a  Regicide  Peace 
Edmund  Burke 

Supervisory  Control  and  Data  Acquisition  (SCADA)  systems  are  deeply  ingrained  in  the  fab¬ 
ric  of  critical  infrastructure  sectors.  These  computerized  real-time  process  control  systems,  over 
geographically  dispersed  continuous  distribution  operations,  are  increasingly  subject  to  serious 
damage  and  disruption  by  cyber  means  due  to  their  standardization  and  connectivity  to  other 
networks.  However,  SCADA  systems  generally  have  little  protection  from  the  escalating  cyber 
threats.  In  order  to  understand  the  potential  danger  and  to  protect  SCADA  systems,  in  this  paper, 
we  highlight  their  difference  from  standard  IT  systems  and  present  a  set  of  security  property  goals. 
Furthermore,  we  focus  on  systematically  identifying  and  classifying  likely  cyber  attacks  including 
cyber- induced  cyber-physical  attacks  on  SCADA  systems.  Determined  by  the  impact  on  control 
performance  of  SCADA  systems,  the  attack  categorization  criteria  highlights  commonalities  and 
important  features  of  such  attacks  that  define  unique  challenges  posed  to  securing  SCADA  sys¬ 
tems  versus  traditional  Information  Technology  (IT)  systems. 

The  utilization  of  Supervisory  Control  and  Data  Acquisition  (SCADA)  systems  facilities  the 
management  with  remote  access  to  real-time  data  and  the  channel  to  issue  automated  or  operator- 
driven  supervisory  commands  to  remote  station  control  devices,  or  field  devices.  They  are  the 
underlying  control  system  of  most  critical  national  infrastructures  including  power,  energy,  water, 
transportation,  telecommunication  and  are  widely  involved  in  the  constitutions  of  vital  enterprises 
such  as  pipelines,  manufacturing  plants  and  building  climate  control. 

Remote  locations  and  proprietary  industrial  networks  used  to  give  SCADA  systems  a  consid¬ 
erable  degree  of  protection  through  isolation  [153,  78].  Most  industrial  plants  now  employ  net¬ 
worked  process  historian  servers  for  storing  process  data  and  other  possible  business  and  process 
interfaces.  The  adoption  of  Ethernet  and  transmission  control  protocol/Internet  protocol  TCP/IP 
for  process  control  networks  and  wireless  technologies  such  as  IEEE  802.x  and  Bluetooth  has 
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further  reduced  the  isolation  of  SC  ADA  networks.  The  connectivity  and  de-isolation  of  SC  ADA 
system  is  manifested  in  Figure  2.1. 


Figure  2.1:  Typical  SCADA  Components  Source:  United  States  Government  Account¬ 

ability  Office  Report.  GAO-04-354  [78] 


Furthermore,  the  recent  trend  in  standardization  of  software  and  hardware  used  in  SCADA 
systems  makes  it  even  easier  to  mount  SCADA  specific  attacks.  Thus  the  security  for  SCADA 
systems  can  no  longer  rely  on  obscurity  or  on  being  a  function  of  locking  down  a  system. 

These  attacks  can  disrupt  and  damage  critical  infrastructural  operations,  cause  major  economic 
losses,  contaminate  ecological  environment  and  even  more  dangerously,  claim  human  lives. 

The  British  Columbia  Institute  of  Technologys  Internet  Engineering  Lab  (BCIT/IEL)  main¬ 
tains  an  industrial  cyber  security  incident  database  [28]  with  more  than  120  incidents  logged  since 
the  initiation.  Baker  et  al  at  McAfee  in  their  201 1  sequel  report  [19]  surveyed  200  IT  security  ex¬ 
ecutives  in  14  counties  from  critical  electricity  infrastructure  enterprises,  where  SCADA  systems 
are  widely  used,  and  found  out  most  facilities  have  been  under  cyber  attacks. 

Being  one  of  most  sophisticated  SCADA  malware  known  to  date1,  Stuxnet  according  to  Fal- 
liere  et.  al  at  Symantec  [70],  takes  advantage  of  multiple  Windows  zero-day  vulnerabilities  and 
targets  the  command-and-control  software  installed  in  industrial  control  systems  world-wide.  It 
sabotages  facilities  by  reprogramming  Programmable  Logic  Controllers  (PLCs)  to  operate  as  the 
attackers  intend  them,  most  likely  out  of  their  specified  boundaries  while  its  “misreporting”  fea¬ 
ture  hides  the  incident  from  the  network  operations  center.  As  of  April  21st  2011,  more  than  50 
new  Stuxnet-like  attacks  beckon  SCADA  threats  have  been  discovered  [194]. 

'in  McAfee’s  report  [19],  nearly  half  of  those  being  surveyed  in  the  electric  industry  said  that  they  had  found 
Stuxnet  on  their  systems. 
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Most  related  works  have  focused  on  the  classification  and  categorization  of  attacks  on  stan¬ 
dard  IT  systems  such  as  [104,  115,  144],  communication  standards  and/or  protocols  [167],  com¬ 
munication  devices  [171].  There  are  work  done  to  enumerate  possible  attacks  on  small  embed¬ 
ded  systems  [82,  225].  More  recently,  SCADA-specific  security  solutions  are  proposed  [75]  and 
SC  ADA- specific  Intrusion  Detection  Systems  (IDS)  are  evaluated  [302]. 

The  remainder  of  this  chapter  is  organized  as  the  follows.  Section  2  compares  SCADA  systems 
with  standard  IT  properties  that  attribute  to  their  security  concerns.  Section  3  defines  desired 
security  properties,  trust  model  and  threat  model.  Section  4  states  vulnerabilities  that  embedded 
in  SCADA  systems.  Section  5,6,7  numerate  cyber  attacks  on  hardware,  software,  communication 
stacks  respectively.  Section  8  concludes. 


2.1  Difference  from  IT 

In  SCADA  systems,  or  control  systems  in  general,  the  fact  that  any  logic  execution  within  the 
system  has  a  direct  impact  in  the  physical  world  dictates  safety  to  be  paramount.  Being  on  the 
first  frontier  to  directly  face  human  lives  and  ecological  environment,  the  field  devices  in  SCADA 
systems  are  deemed  with  no  less  importance  than  central  hosts  2  [42].  Also  certain  operating 
systems  and  applications  running  on  SCADA  systems,  which  are  unconventional  to  typical  IT 
personnel,  may  not  operate  correctly  with  commercial  off-the-shelf  IT  cyber  security  solutions. 

Furthermore,  factors  like  the  continuous  availability  demand,  time-criticality,  constrained  com¬ 
putation  resources  on  edge  devices,  large  physical  base,  wide  interface  between  digital  and  analog 
signals,  social  acceptance  including  cost  effectiveness  and  user  reluctance  to  change,  legacy  issues 
and  so  on  make  SCADA  system  a  peculiar  security  engineering  task. 

SCADA  systems  are  hard  real-time  systems  [251]  because  the  completion  of  an  operation  after 
its  deadline  is  considered  useless  and  potentially  can  cause  cascading  effect  in  the  physical  world. 
The  operational  deadlines  from  event  to  system  response  imposes  stringent  constraints:  missing 
deadline  constitutes  a  complete  failure  of  the  system.  Latency  is  very  destructive  to  SCADA 
system’s  performance:  the  system  does  not  react  in  a  certain  time  frame  would  cause  great  loss  in 
safety,  such  as  damaging  the  surroundings  or  threatening  human  lives. 

It’s  not  the  length  of  time  frame  but  whether  meeting  the  deadline  or  not  distinguishes  hard 
real-time  system  from  soft  real-time  system.  In  contrast,  soft  real-time  systems,  such  as  live  audio¬ 
video  systems,  may  tolerate  certain  latency  and  respond  with  decreased  service  quality,  eg.  drop¬ 
ping  frames  while  displaying  a  video.  Non-major  violation  of  time  constraints  in  soft  real-time 
systems  leads  to  degraded  quality  rather  than  system  failure. 

Furthermore  due  to  the  physical  nature,  tasks  performed  by  SCADA  system  and  the  processes 
within  each  task  are  often  needed  to  be  interrupted  and  restarted.  The  timing  aspect  and  task 
interrupts  can  preclude  the  use  of  conventional  encryption  block  algorithms. 

As  Real-time  operating  system  (RTOS),  SCADA’s  vulnerability  also  rises  from  the  fact  that 
memory  allocation  is  even  more  critical  in  an  RTOS  than  in  other  operating  systems.  Many  field 

2  Although  arguably,  a  compromised  central  serverl/controller  may  cause  server  harm  if  the  field  devices  don’t 
have  their  own  individual  and  local  protection. 
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level  devices  in  SCADA  system  are  embedded  systems  that  run  years  without  rebooting  but  accu¬ 
mulating  fragmentation. 

Thus,  buffer  overflow  is  more  problematic  in  SCADA  than  in  traditional  IT. 


2.2  Problem  Statement 

Before  we  state  the  security  properties  that  are  desirable  for  SCADA  systems  to  achieve,  we 
must  point  out  that  there  are  many  trade-offs  between  security  and  control  performance  goals. 
And  we  will  group  attacks  according  to  the  hierarchy  of  the  SCADA  system. 

2.2.1  Security  Property  Goal 

Control  systems  have  many  characteristics  that  are  different  from  traditional  IT  systems  in 
terms  of  risks  and  operational  priorities  thus  render  unique  performance  and  reliability  require¬ 
ments  besides  the  use  of  operating  systems  and  applications  being  unconventional  to  typical  IT 
personnel. 

Even  where  security  is  well  defined,  the  primary  goal  in  the  Internet  is  to  protect  the  central 
server  and  not  the  edge  client.  In  process  control,  an  edge  device,  such  as  PLC  or  smart  drive 
controller,  is  not  necessarily  merited  less  importance  than  a  central  host  such  as  data  historian 
server  [42],  as  they  are  on  the  first  frontier  facing  human  lives  and  ecological  environment. 

These  differences  between  SCADA  systems  and  IT  systems  demand  an  adjusted  set  of  security 
property  goals  and  thus  security  and  operational  strategies. 

In  the  traditional  IT  community,  the  set  of  common  desirable  security  properties  are  confiden¬ 
tiality ,  integrity  and  availability ,  or  CIA  in  short.  The  paramount,  in  IT’s  world  is  confidentiality 
and  integrity  while  in  control  systems  is  system  availability  and  data  integrity  as  result  of  human 
and  plant  safety  being  its  primary  responsibility. 

Particularly,  most  of  computer  security  research  focus  on  confidentiality.  To  be  SCADA  sys¬ 
tem  specific,  we  prioritize  security  properties  of  SCADA  systems  in  the  order  of  its  importance 
and  desirability  in  industry,  especially  in  control  engineering  sector.  The  modification  we  make 
addresses  the  special  needs  incurred  from  the  unique  characteristics  of  SCADA  systems,  namely 
the  time  criticality,  dispersed  distributed-ness  and  continuous  availability. 

There  are  different  versions  of  definition  and  use  of  security  properties  [12]  with  slight  varia¬ 
tions.  However,  in  light  to  differentiate  the  uniqueness  of  control  systems  from  standard  IT  sys¬ 
tems,  it’s  necessary  for  us  to  stress  and  explain  some  more  relevant  subtleties.  Nevertheless,  it’s 
not  to  say  that  these  properties  we  want  to  highlight  are  mutual  exclusive,  absent  of  over-lapping. 

Timeliness 

explicitly  expresses  the  time-criticality  of  control  systems,  a  given  resulted  from  being  real¬ 
time  system,  and  the  concurrencies  in  SCADA  systems  due  to  being  widely  dispersed  distributed 
systems. 

It  includes  both  the  responsiveness  aspect  of  the  system,  e.g.  a  command  from  controller  to 
actuator  should  be  executed  in  real-time  by  the  latter,  and  the  timeliness  of  any  related  data  being 
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delivered  in  its  designated  time  period,  by  which,  we  also  mean  the  freshness  of  data,  i.e.,  the  data 
is  only  valid  in  its  designated  time  period.  Or  in  a  more  general  sense,  this  property  describes  that 
any  queried,  reported,  issued  and  disseminated  information  shall  not  be  stale  but  corresponding 
to  the  real-time  and  the  system  is  able  and  sensitive  enough  to  process  request,  which  may  be  of 
normal  or  of  legitimate  human  intervention  in  a  timely  fashion,  such  as  within  a  sampling  period. 
In  reality,  if  arrives  late  or  repeatedly  to  the  specified  node,  a  message  is  no  longer  any  good,  be  it 
a  correct  command  to  an  actuator  or  a  perfect  measurement  from  a  sensor  with  intact  content.  As 
a  matter  of  fact,  any  replay  of  data  easily  breaches  this  security  goal. 

Moreover,  this  property  also  implicitly  implies  the  order  of  updates  among  peered  sensors,  es¬ 
pecially  if  they  are  observing  the  same  process  or  correlated  processes.  The  order  of  data  arrival  at 
central  monitor  room  may  play  an  important  factor  in  the  representation  of  process  dynamics  and 
affect  the  correct  decision  making  of  either  the  controlling  algorithms  or  the  supervising  human 
operators. 

In  a  nutshell,  all  right  data  should  be  processed  in  right  time,  which  unfolds  an  underpinning 
security  goal  -  secure  time  provision. 

Availability 

means  when  any  component  of  a  SCADA  system,  may  it  be  a  sensory  or  servomechanical 
device,  communication  or  networking  equipment,  or  radio  channel;  computation  resource  and 
information  such  as  sensor  readings  and  controller  commands  etc.  that  transmits  or  resides  within 
the  system  should  be  ready  for  use  when  is  needed.  Most  of  SCADA  controlled  processes  are 
continuous  in  nature.  Unexpected  outages  of  systems  that  control  industrial  processes  are  not 
acceptable.  This  desired  property  for  both  SCADA  systems  control  performance  and  security 
goal  requires  that  the  security  mechanism  employed  onto  SCADA  systems,  including  but  not 
limited  to  the  overall  cryptographic  system,  shall  not  degrade  the  maintainability,  operability  , 
and  its  accessibility  at  emergency,  of  the  original  SCADA  system  without  those  security  oriented 
add-ons. 

Integrity 

requires  data  generated,  transmitted,  displayed,  stored  within  a  SCADA  system  being  genuine 
and  intact  without  unauthorized  intervention,  including  both  its  content,  which  may  also  include 
the  header  for  its  source,  destination  and  time  information  besides  the  payload  itself.  A  very 
related  terminology  is  authenticity ,  in  the  context  of  SCADA  system,  it  implies  that  the  identity 
of  sender  and  receiver  of  any  information  shall  be  genuine.  Using  our  definition  of  integrity,  then 
authenticity  falls  within  the  same  category.  One  can  image  how  disastrous  the  consequence  can 
be,  if  a  control  command  is  redirected  to  an  actuator  other  than  its  intended  receiver  or  fake  or 
wrong  source  information  of  a  sensor  measurement  being  reported  to  the  central  controller.  The 
intra-message  integrity  means  specifically  the  content  of  message  to  be  genuine  and  inter-message 
integrity  refers  to  assure  data  integrity,  the  protocol  must  prevent  an  adversary  from  constructing 
unauthentic  messages,  modifying  messages  that  are  in  transit,  reordering  messages,  replaying  old 
messages,  or  destroying  messages  without  detection. 
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Confidentiality 

refers  to  that  unauthorized  person  should  not  have  any  access  to  information  related  to  the 
specific  SCADA  system.  At  current  stage,  this  need  is  dwarfed  by  the  desirability  of  availability 
in  a  control  performance-centric  setting.  SCADA  systems  measure  and  control  physical  processes 
that  generally  are  of  a  continuous  nature  with  commands  and  responses  are  simple  and  repetitive. 
Thus  the  messages  in  SCADA  systems  are  relatively  easy  to  predict.  Hence  confidentiality  is 
secondary  in  importance  to  data  integrity. 

However,  the  confidentiality  of  critical  information  such  as  passwords,  encryption  keys,  de¬ 
tailed  system  layout  map  and  etc.  shall  rank  high  when  it  comes  to  security  concerns  in  industry. 
Applicable  reinforcement  should  be  imposed  in  this  aspect.  Also,  the  information  regarding  phys¬ 
ical  content  flowed  within  the  control  algorithm  may  be  subject  to  leaking  critical  message  to  side 
channel  attacks. 

The  drastic  difference  in  the  ordering  of  desired  security  properties  is  mostly  due  to  that 
SCADA  systems  are  demanded  to  be  real-time  operating  and  continuously  functioning. 

Graceful  Degradation 

requires  the  system  being  capable  of  keeping  the  attack  impact  local  and  withholding  tinted 
data  flow  within  tainted  region  without  further  escalating  into  a  full  scale,  full  system  cascading 
event. 

Again,  all  these  desired  security  properties  are  not  mutual  exclusive  but  closely  related.  For 
example,  by  breaching  integrity,  an  adversary  can  change  control  signals  to  cause  a  device  mal¬ 
function  which  might  ultimately  affect  the  availability  of  the  network.  Overall,  a  tightly  enforced 
access  control  may  render  confidentiality,  integrity,  availability  ,  timeliness  and  graceful  degra¬ 
dation  as  well. 

2.2.2  Trust  Model 

Given  that  we  focus  on  the  cyber  attacks  on  SCADA  system,  we  restrain  our  attention  to  attacks 
mounted  through  cyber  means  3  and  assume  the  basic  physical  security  is  provided.  Particularly, 
the  SCADA  server  or  Master  Terminal  Unit  is  physically  secure,  i.e.,  we  assume  there  are  no  direct 
physical  tampering  on  the  server  where  the  main  control  and  estimation  algorithms  reside.  Brute 
force  physical  sabotage  such  as  cutting  wires  and  cables  from  communication  and  power  supply 
or  hammering  devices  or  radio  jamming  are  out  the  scope  of  this  paper. 

Furthermore,  we  assume  that  the  control  and  estimation  algorithms  are  programmed  securely. 

3  As  stated  in  previous  sections,  these  cyber  attacks  are  most  likely  resulted  in  physical  destruction  in  SCADA 
systems. 
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2.2.3  Threat  Model 

Typical  threats  to  sensor  networks  and  to  conventional  IT  systems  are  also  threats  to  SCADA 
systems  if  the  adversarial  have  means  to  exploit  the  vulnerabilities  of  SCADA  systems4.  The 
adversary  sources  include  but  not  limited  to  hostile  governments,  terrorist  groups,  foreign  intel¬ 
ligence  services,  industrial  spies,  criminal  groups,  disgruntled  employees,  bot-network  operators, 
phishers,  spywaremalware  authors,  spammers,  and  attackers  [80].  We  assume  attacks  come  from 
one  side  of  SCADA  center  only  and  there’s  no  collusion. 


2.3  Vulnerability 

The  current  common  practice  of  SCADA  system  leaves  window  open  to  various  vulnerabili¬ 
ties.  To  name  a  few,  the  entrenched  factors  are  not  limited  to  public  information  like  a  company’s 
network  infrastructure,  insecure  network  architecture,  operating  system  vulnerabilities  enabled 
trap  doors  to  unauthorized  users  and  the  use  of  wireless  devices.  In  particular,  the  lack  of  real¬ 
time  monitoring  and  proper  encryption  is  very  detrimental. 

Cyber  attacks  on  SCADA  system  can  take  routes  through  Internet  connections,  business  or 
enterprise  network  connections  and  or  connections  to  other  networks,  to  the  layer  of  control  net¬ 
works  then  down  the  level  of  field  devices.  More  specifically,  the  common  attack  vectors  are 

•  Backdoors  and  holes  in  network  perimeter 

•  Vulnerabilities  in  common  protocols 

•  Attacks  on  field  devices  through  cyber  means 

•  Database  attacks 

•  Communications  hijacking  and  Man-in-the-middle  attacks 

•  Cinderella  attack  on  time  provision  and  synchronization 

From  the  point  view  of  a  control  engineer,  possible  attacks  can  be  grouped  into  following 
categories 

•  bogus  input  data  to  the  controller  introduced  by  compromised  sensors  and/or  exploited  net¬ 
work  link  between  the  controller  and  the  sensors 

•  manipulated  and  misleading  output  data  to  the  actuators/reactors  from  the  controller  due 
to  tempered  actors/  reactors  or  compromised  network  link  between  the  controller  and  the 
actuators 

•  controller  historian 

4  Note  we  are  making  a  rather  conservative  assumption  in  light  of  exploring  the  potentials  of  cyber  security  issues 
in  the  SCADA  system  domain.  Any  further  suitable  and  refined  threat  model  depends  on  the  cost  effectiveness  of  the 
security  measures. 
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•  Denial  of  Service  -  missing  the  deadlines  of  needed  task  actions. 

There  is  still  little  reported  information  about  actual  SCADA  attacks  nor  scenarios  designed 
by  red-teams,  despite  the  growing  awareness  of  security  issues  in  industrial  networks.  However, 
by  leveraging  the  existing  solution  and  understanding  of  the  conventional  IT  system,  we  use  the 
SCADA  hierarchy  as  a  reference  plane.  Then  the  classification  of  cyber  attacks  can  fall  into  the 
following  categories. 


2.4  Cyber  Attacks  on  Hardware 

Attacker  might  gain  unauthenticated  remote  access  to  devices  and  change  their  data  set  points. 
This  can  cause  devices  to  fail  at  a  very  low  threshold  value  or  an  alarm  not  to  go  off  when  it  should. 
Another  possibility  is  that  the  attacker,  after  gaining  unauthenticated  access,  could  change  the 
operator  display  values  so  that  when  an  alarm  actually  goes  off,  the  human  operator  is  unaware  of 
it.  This  could  delay  the  human  response  to  an  emergency  which  might  adversely  affect  the  safety 
of  people  in  the  vicinity  of  the  plant.  Some  of  the  detailed  procedure  of  achieve  such  attacks  are 
given  out  in  later  section  when  we  describe  specific  SCADA  protocols. 

The  main  issue  in  preventing  cyber  attacks  on  hardware  is  access  control.  With  that  in  mind, 
we  should  mention  one  of  the  representative  attacks  in  this  category,  namely  the  doorknob-rattling 
attack.  The  adversary  performs  a  very  few  common  username  and  password  combinations  on 
serval  computers  that  results  in  very  few  failed  login  attempts.  This  attack  can  go  undetected 
unless  the  data  related  to  login  failures  from  all  the  hosts  are  collected  and  aggregated  to  check  for 
doorknob-rattling  from  any  remote  destination. 


2.5  Attacks  on  Software 

As  listed  in  earlier  sections,  SCADA  system  employs  a  variety  of  software  to  meet  its  func¬ 
tionality  demands.  Also  there  are  large  databases  reside  in  data  historians  besides  many  relational 
database  applications  used  in  cooperate  and  plant  sessions. 

Hosting  centralized  database  ,  data  historians  contain  vital  and  potentially  confidential  process 
information.  These  data  are  not  only  indispensable  for  technical  reasons,  such  as  that  many  control 
algorithms  rely  on  past  process  data  to  make  correct  decisions,  but  also  for  business  purposes,  such 
as  electricity  pricing. 

Although  we’ve  assumed  the  algorithms  of  these  softwares  are  trustworthy,  there  are  still  vul¬ 
nerabilities  associated  with  their  implementations.  The  most  common  implementation  flaw  is 
buffer  overflow  among  others  such  as  format  string,  integer  overflow  and  etc.  The  fact  that  most 
control  applications  are  written  in  C  requires  us  to  take  extra  precaution  with  this  vulnerability. 

2.5.1  No  Privilege  Separation  in  Embedded  Operating  System 

VxWorks  was  the  most  popular  embedded  operating  system  in  2005  and  claimed  300  million 
devices  in  2006  [212],  which  is  a  platform  developed  by  Wind  River  Systems  and  has  since  been 
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acquired  by  Intel  [190].  VxWorks  has  been  used  to  power  everything  from  the  Apple  Airport 
Extreme  access  points  to  the  Mars  rovers  and  the  C-130  Hercules  aircraft  [182].  VxWorks  itself  is 
essentially  a  monolithic  kernel  with  applications  implemented  as  kernel  tasks,  This  means  that  all 
tasks  generally  run  with  the  highest  privileges  and  there  is  little  memory  protection  between  these 
tasks. 

2.5.2  Buffer  Overflow 

Many  attacks  boil  down  to  cause  buffer  overflow  as  their  eventual  means  to  corrupt  the  in¬ 
tended  behavior  of  the  program  and  cause  it  to  run  amok.  Some  general  methods  are  stack  smash¬ 
ing  and  manipulating  function  pointer. 

The  effect  of  such  attacks  can  take  forms  such  as  resetting  passwords,  modifying  content, 
running  malicious  code  and  so  on. 

The  buffer  overflow  problem  in  SCADA  system  takes  two  fronts.  One  front  is  on  the  worksta¬ 
tions  and  servers  which  are  similar  to  standard  IT  systems. 

For  example,  WellinTech  KingView  6.53  HistorySvr,  an  industrial  automation  software  for 
historian  sever  widely  used  in  China,  has  a  heap  buffer  overflow  vulnerability  that  could  potentially 
become  the  risk  of  a  Stuxnet  type  mishap  if  not  patched  [32]  . 

The  other  front  manifests  itself  in  field  devices  and  other  components  that  rely  on  RTOS 
thereof  inherent  the  susceptible  memory  challenge.  Exploits  can  take  advantage  of  the  fixed  mem¬ 
ory  allocation  time  requirement  in  RTOS  system  to  have  more  successful  launchings.  Let  alone 
that  many  field  devices  run  for  years  without  rebooting.  Therefore,  these  SCADA  components, 
especially  in  legacy  networks,  are  subject  to  accumulated  memory  fragmentation,  which  leads  to 
program  stall. 

The  Hardware/Software  Address  Protection  (HSAP)  technique  offered  by  [246]  including 
hardware  boundary  check  method  and  function  pointer  XOR  method  to  deal  with  stack  smash¬ 
ing  attack  and  function  pointer  attack  in  embedded  systems  ,  respectively. 

2.5.3  SQL  Injection 

Most  small  and  industrial-  strength  database  applications  can  be  accessed  using  Structured 
Query  Language  (SQL)  statements  for  structural  modification  and  content  manipulation.  In  light 
of  data  historians  and  web  accessibility  in  current  SCADA  systems,  SQL  injection,  one  of  the  top 
Web  attacks,  has  a  very  strong  implication  on  the  security  of  SCADA  system. 

The  typical  unit  of  execution  of  SQL  which  comes  in  many  dialects  loosely  based  around 
SQL-92  ANSI  standard  is  query ,  which  is  a  collection  of  statements  that  typically  return  a  single 
result  set.  SQL  injection  occurs  when  an  adversary  is  able  to  manipulate  data  input  into  an  Web 
application,  which  fails  properly  sanitize  user-supplied  input,  and  to  insert  a  series  of  unexpected 
SQL  statements  into  a  query.  Thus  it  is  possible  to  manipulate  a  database  in  several  unanticipated 
ways.  Moreover,  if  a  “command  shell”  store  procedure  is  enabled,  an  attacker  can  move  further 
to  prompt  level.  The  process  will  run  with  the  same  permissions  as  the  component  that  executed 
the  command.  The  impact  of  this  attack  can  allow  attackers  to  gain  total  control  of  the  database 
or  even  execute  commands  on  the  system. 
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In  the  case  studied  in  [206],  where  the  store  procedure  in  SQL  server  (shown  in  Fig. 2. 2)  is 
enabled  by  default.  Thus  an  attacker  still  can  get  into  SCADA  system  even  though  two  LAN  cards 
are  installed. 


Figure  2.2:  SQL  Attack 

Intentionally  malicious  changes  to  databases  can  cause  catastrophic  damage. 


2.6  Attacks  on  the  Communication  Stack 

We  break  down  the  attacks  on  the  communication  stack  by  using  the  TCP/IP  or  the  Internet 
reference  model  and  highlight  some  of  those  may  have  more  potentials  in  harming  SCADA  sys¬ 
tems,  in  particular  on  network  layer,  transport  layer,  application  layer  and  the  implementation  of 
protocols. 

The  UDP  back  door  on  port  0x4321  on  thousands  of  devices  is  known  in  the  public  since  at 
least  spring  2002. 

There  are  many  well-known  TCP/IP  attacks  in  literature,  readers  please  refer  to  [115,  104]  for 
more  details. 

2.6.1  Network  Layer 

Diagnostic  Server  Attacks  through  UDP  port 

Adversaries  have  access  to  the  same  debugging  tools  that  any  RTOS  developers  do.  They 
can  read  symbol  tables,  step  through  the  assembly,  etc.,  considering  also  that  many  attackers 
don’t  even  need  code-level  knowledge.  For  example  Wind  River  Systems  VxWorks  weak  default 
hashing  algorithm  in  standard  authentication  API  for  VxWorks  is  susceptible  to  collisions,  an 
attacker  can  brute  force  a  password  by  guessing  a  string  that  produces  the  same  hash  as  a  legitimate 
password  5.  Or  through  VxWorks  debug  service  runs  UDP  on  port  17185,  which  is  enabled  by 

5 US -Cert  VU  #840249. 
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default,  an  attacker  can  execute  the  following  attacks  without  any  authentication  required  while 
maintaining  a  certain  level  of  stealthiness  such  as  remote  memory  dump,  remote  memory  patch, 
remote  calls  to  functions,  remote  task  management  6 . 

The  VxWorks  Wind  DeBug  (WDB)  is  an  RPC-based  protocol  which  uses  UDP  can  explored 
over  the  Internet  by  downloading  hacking  software  and  adding  targets  to  a  host  list  before  running 
the  script. 

Idle  Scan 

is  to  blind  port  scan  by  bouncing  off  a  dumb  “zombie”  host,  often  a  preparation  for  attack.  Both 
MODBUS  and  DNP3  have  scan  functionalities  prone  to  such  attacks  when  they  are  encapsulated 
for  running  over  TCP/IP. 

Smurf 

is  a  type  of  address  spoofing,  in  general,  by  sending  a  continuous  stream  of  modified  Internet 
Control  message  Protocol(lCMP)  packets  to  the  target  network  with  the  sending  address  is  iden¬ 
tical  to  one  of  the  target  computer  addresses.  In  the  context  of  SC  ADA  systems,  if  an  PLC  acts  on 
the  modified  message,  it  may  either  crash  or  dangerously  send  out  wrong  commands  to  actuators. 

Address  Resolution  Protocol  (ARP)  Spoofing/Poisoning 

The  ARP  is  primarily  used  to  translate  IP  addresses  to  Ethernet  Medium  Access  Control 
(MAC)  addresses  and  to  discover  other  connected  interfaced  device  on  the  LAN.  The  ARP  spoof¬ 
ing  attack  is  to  modify  the  cached  address  pair  information. 

By  sending  fake  ARP  messages  which  contain  false  MAC  addresses  in  SCADA  systems,  an 
adversary  can  confuse  network  devices,  such  as  network  switches.  When  these  frames  are  false- 
fully  sent  to  another  node,  packets  can  be  sniffed;  or  to  an  unreachable  host,  DoS  is  launched; 
or  intentionally  to  an  host  connected  to  different  actuators,  then  physical  disasters  of  different 
scales  are  initiated. 

Static  MAC  address  is  one  of  the  counter  measures.  However,  certain  network  switches  do  not 
allow  static  setting  for  a  pair  of  MAC  and  IP  address.  Segmentation  of  the  network  may  also  be  a 
method  to  alleviate  the  problem  in  that  such  attacks  can  only  take  place  within  same  subnet. 

Chain/Loop  Attack 

In  a  chain  attack,  there  is  a  chain  of  connection  through  many  nodes  as  the  adversary  moves 
across  multiple  nodes  to  hide  his  origin  and  identity.  In  case  of  a  loop  attack,  the  chain  of  connec¬ 
tions  is  in  a  loop  make  it  even  harder  to  track  down  his  origin  in  a  wide  SCADA  system. 

6US-Cert  VU  #362332 
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2.6.2  Transport  Layer 

SYN  flood  is  to  saturate  resources  by  sending  TCP  connection  requests  faster  than  a  machine 
can  process. 

SCADA  protocols,  particularly  those  running  over  top  of  transport  protocols  such  as  TCP/IP 
have  vulnerabilities  that  could  be  exploited  by  attacker  through  methodologies  as  simple  as  inject¬ 
ing  malformed  packets  to  cause  the  receiving  device  to  respond  or  communicate  in  inappropriate 
ways  and  result  in  the  operator  losing  complete  view  or  control  of  the  control  device. 

2.6.3  Application  Layer 

Currently,  there  is  no  strong  security  control  in  protocols  used  in  SCADA  systems,  such  as 
DNP3  without  secure  authentication,  Modbus, Object  Linking  and  Embedding  (OLE)  for  Process 
Control  (OPC),  Inter-Control  Center  Communications  Protocol  (ICCP).  Practically  there  is  no  au¬ 
thentication  on  source  and  data  such  that  for  those  who  have  access  to  a  device  through  a  SCADA 
protocol,  they  can  often  read  and  write  as  well.  The  write  access  and  diagnostic  functions  of  these 
protocols  are  particular  vulnerable  to  cyber  and  cyber  induced  physical  attacks. 

One  of  possible  attacks  in  both  SCADA  and  conventional  IT  systems  is  DNS  forgery.  Such 
attack  is  to  send  a  fake  DNS  reply  with  a  matching  source  IP,  destination  port,  request  ID,  but 
with  an  attacker  manipulated  information  inside,  so  that  this  fake  reply  may  be  processed  by  the 
client  before  the  real  reply  is  received  from  the  real  DNS  server.  For  more  details  on  those  attacks 
studied  in  conventional  IT  systems,  please  refer  to  [104]. 

Next,  we  list  potential  attacks  associated  with  more  SCADA  specific  protocols. 

MODBUS 

Modbus  [187]  is  a  de  facto  standard  of  application  layer  protocol  used  in  industrial  networks. 
It  comes  with  different  flavors  from  plain  Modbus  to  Modbus+  to  Modbus/TCP.  A  Modbus  client 
(or  master)  can  send  a  request  to  a  Modbus  server  (or  slave)7  with  a  function  code  that  specifies  the 
action  to  be  taken  and  a  data  field  that  provides  the  additional  information.  The  general  Modbus 
frame  is  shown  in  Figure  (2.3). 


Application  Data  Unit 


Additional  address 


Protocol  Data  Unit 


Figure  2.3:  A  typical  Modbus  frame 

Among  currently  little  published  accounts  on  attacks  against  Modbus,  Digital  Bond  [210] 
has  conducted  intrusion  detection  work  on  studying  its  potential  weakness.  Their  detection  rules 

initially,  Modbus  was  a  master-slave  protocol  for  serial  buses.  When  implementing  Modbus  over  TCP,  a  Modbus 
master  is  a  TCP  client,  and  a  Modbus  slave  is  a  TCP  server. 
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include  denial  of  service  (e.g.,  rebooting  Modbus  servers,  configuring  them  to  provide  no  ser- 
vicecalled  listen-only  mode,  and  crashing  servers  with  a  large  size  request),  reconnaissance  (e.g., 
unauthorized  reading  of  data,  and  gathering  device  information),  and  unauthorized  write  requests. 

Byres  and  his  company  have  used  Achilles  Vulnerability  Test  Platform  to  perform  security 
tests  on  Modbus  to  discover  vulnerabilities  [42,  43]  . 

Given  that  Modbus  does  not  have  encryption  or  any  other  security  measures,  there  are  many 
ways  to  directly  explore  such  weakness  on  the  function  code  level.  The  function  codes  0x05  and 
OxOF  are  used  to  write  a  single  or  multiple  outputs  (coils)  to  either  ON  or  OFF  in  a  remote  device, 
respectively.  This  means  that  an  adversary  can  turn  off  and  suppress  output(s)  remotely  thus  to 
create  a  false  sense  of  situation  at  the  HMI  end.  Unauthorized  writes  can  be  accomplished  through 
using  function  codes  0x06  and  0x10.  Accordingly,  the  forged  data  may  be  written  to  either  a  single 
or  multiple  registers  in  a  remote  device.  If  Modbus  is  implemented  on  serial  line,  function  code 
0x11  can  be  used  to  gather  information  from  a  remote  device,  such  as  a  controller’s  description. 
Function  code  0x08  is  used  for  diagnostics  on  serial  line.  However,  combined  with  subfunction 
code  0x01,  it  can  initialize  and  restart  the  slave  (server)  port  and  clear  out  the  communication 
event  counter,  which  is  a  ideal  attack  vector.  When  combined  with  subfunction  code  0x04,  the 
diagnostics  function  code  can  force  a  remote  device  into  its  Listen  Only  Mode.  Similarly,  Mod- 
bus-i-  has  a  function  code  (08)  for  log  cleaning  that  can  enable  an  attacker  to  clear  stats  of  data 
manipulation  and  denial  of  service  events. 

DNP3 

DNP3  is  used  between  master  control  stations  and  remote  computers  or  controllers  called 
outstations  for  the  electric  utility  industry  and  water  companies.  DNP3  is  implemented  by  sev¬ 
eral  manufacturers  due  to  its  small  memory  consumption.  Its  function  code  OxOD  can  reset  and 
reconfigure  DNP3  outstations  by  forcing  them  to  perform  complete  power  cycle.  During  the 
re-initialization  to  default  values,  many  devices  clear  all  queues  as  well.  An  attacker  can  take 
advantage  of  this  property  to  cause  delay  in  outstations  before  they  accept  requests  again.  Fur¬ 
thermore,  function  code  0x13  enable  loading  new  outstation  configurations.  With  unauthorized 
access,  an  attacker  can  manipulate  the  remote  devices  with  manipulated  setting  values,  suppress 
output  and  or  create  false  alarms. 

2.6.4  Attacks  on  Implementation  of  Protocols 

Protocol  vulnerabilities  can  reveal  themselves  as  segmentation  faults,  stack,  heap  or  buffer 
overflows,  etc.,  all  of  which  can  cause  the  protocol  implementation  to  fail  resulting  in  a  potential 
exploit. 

Meanwhile,  certain  protocol  implementations,  such  as  ICCP  servers,  only  allow  users  to  read 
values,  and  there  are  a  number  of  protocols  that  are  in  the  process  of  adding  security  controls  to 
address  this  deficiency. 

Nevertheless,  [210]  argues  that  SC  ADA  implementation  vulnerabilities  are  more  important 
than  lack  of  security  controls  in  SCADA  protocols. 
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TCP/IP 

First  of  all,  in  light  of  the  migration  to  Windows  from  UNIX  in  operating  system  used  by 
many  sectors  in  SCADA  systems,  there  are  several  attacks  specifically  exploit  the  implementation 
of  TCP/IP  protocols  in  Windows.  Although  there  are  patches  available,  restrained  to  be  on-line 
continuously,  it’s  very  likely  that  these  machines  do  not  have  up-to-dated  patches.  Here,  we  only 
name  a  few  well  known  ones. 

•  WinNuke  takes  advantage  of  the  absence  of  status  flag  URG  in  handling  the  TCP  protocol. 

•  TearDrop/NearTear  and  Ssping  utilize  implementation  error  of  fragmentation  handling  in 
TCP/IP  protocol. 

A  nightmare  scenario  can  be  that  one  company’s  network  is  compromised  and  a  polymorphic 
worm  takes  down  most  servers  and  any  unpatched  SCADA  servers  running  Windows. 

Secondly,  these  protocol  stacks  can  and  do  suffer  from  various  vulnerabilities  commonly  found 
due  to  poor  software  design  and  coding  practices. 

OPC 

OPC  servers  use  Microsoft’s  OLE  technology8  to  provide  real-time  information  exchange  be¬ 
tween  software  applications  and  process  hardware. 

At  the  OPC  interface  level,  the  item  write  function  takes  two  parameters:  an  item  handle  and 
a  value  to  write  to  it.  If  the  server  maps  handles  to  memory  addresses  and  fails  to  validate  a 
client-provided  handle,  the  10  interfaces  write  function  allows  an  attacker  to  write  any  value  to 
any  memory  address,  a  primitive  which  can  be  easily  exploited  to  run  arbitrary  code  on  the  server 
(e.g.  through  stack  return  addresses  ).  It  is  an  even  larger  issue  that  an  OPC  server  can  be  remotely 
compromised  and  used  to  launch  attacks  on  other  systems.  Because  OPC  servers  are  often  exposed 
in  the  Demilitarized  Zone  (DMZ),  this  could  be  a  communication  chain  that  could  allow  control 
system  exploitation  from  the  enterprise  network  or  Internet. 

[27]  gives  three  possible  OPC  attack  scenarios,  of  which  are  all  associated  with  extra  open 
ports: 


•  Collateral  Damage  by  OPC-Unaware  Malware; 

•  Opportunistic  OPC  Denial  of  Service  Attack; 

•  Intelligent,  aggressive  attack  against  OPC  hosts  through  a  man-in-the-middle  (MITM)  tech¬ 
nique 


ICCP 

The  most  serious  and  exposed  SCADA  protocol  stacks  are  those  that  are  used  to  exchange 
information  with  business  partners,  such  as  ICCP,  or  those  used  to  exchange  information  between 
the  corporate  network  and  control  center  network. 

8  Also  known  as  the  Component  Object  Model,  or  COM 
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According  to  the  LiveData  ICCP  Server  white  paper  [268],  LiveData  ICCP  server  contains  a 
heap-based  buffer  overflow.  The  LiveData  implementation  of  ISO  Transport  Service  over  TCP 
(RFC  1006)  is  vulnerable  to  a  heap-based  buffer  overflow.  By  sending  a  specially  crafted  packet 
to  a  vulnerable  LiveData  RFC  1006  implementation,  a  remote  attacker  may  be  able  to  trigger  the 
overflow  to  execute  arbitrary  code  or  crash  a  LiveData  ICCP  Server  to  cause  a  denial  of  service. 

UCA 

UCA  was  expected  to  be  more  robust  standard  than  DNP3  when  the  Electric  Power  Research 
Institute  (EPRI)  decided  to  use  it  to  serve  the  SCADA  needs  of  the  electric  utilities.  It’s  based  on 
the  Manufacturing  Message  Specification  from  ISO  standard  9506. 

MMS 

Tamarack  MMS<7  is  an  implementation  of  Manufacturing  Message  Specification  (MMS)  pro¬ 
tocol,  an  international  standard  (ISO  9506),  dealing  with  messaging  system  for  transferring  real 
time  process  data  and  supervisory  control  information  between  networked  field  devices  and/or 
computer  applications. 

Tamarack  MMSd9  components  do  not  properly  handle  malformed  RFC  1006  packets  either. 
This  vulnerability  may  allow  a  remote,  unauthenticated  attacker  to  cause  a  denial  of  service  con¬ 
dition. 


2.7  Discussion 

The  cyber-physical  security  of  real-time,  continuous  systems  necessitates  a  comprehensive 
view  and  holistic  understanding  of  network  security,  control  theory  and  the  physical  system.  Ul¬ 
timately,  any  viable  technical  solutions  and  research  directions  in  securing  SCADA  systems  must 
lie  in  the  conjunction  of  computer  security,  communication  network  and  control  engineering.  The 
idea  of  looking  into  the  problem  in  the  context  of  control  performance  holds  its  solid  bearings. 
However,  the  very  large  installed  base  of  such  systems  means  that  in  many  instances  we  must 
for  a  long  time  to  come  rely  on  retrofitted  security  mechanisms,  rather  than  having  the  option  to 
design  them  in  from  scratch.  This  leads  to  a  pressing  need  for  robust  SCADA-specific  intrusion 
detection  systems  (IDS)  and  resilient  control. 

Our  next  step  is  to  categorize  the  attacks  in  terms  of  their  manifestation  and  realization  in  order 
to  shed  more  light  into  intrusion  prevention  and  detection. 


9  Vulnerability  Note  VU#372878 
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Chapter  3 

SCADA-specific  Intrusion 
Detection/Prev ention  Systems:  A  Survey 
and  Taxonomy 


Due  to  standardization  and  connectivity  to  the  Internet,  Supervisory  Control  and  Data  Acqui¬ 
sition  (SCADA)  systems  now  face  the  threat  of  cyber  attacks.  SCADA  systems  were  designed 
without  cyber  security  in  mind  and  hence  the  problem  of  how  to  modify  conventional  Information 
Technology  (IT)  intrusion  detection  techniques  to  suit  the  needs  of  SCADA  is  a  big  challenge.  We 
explain  the  nuance  associated  with  the  task  of  SCADA-specific  intrusion  detection  and  frame  it 
in  the  domain  interest  of  control’s  researchers  to  illuminate  problem  space.  We  present  a  taxon¬ 
omy  and  a  set  of  metrics  for  SCADA-specific  intrusion  detection  techniques  by  heightening  their 
possible  use  in  SCADA  systems.  In  particular,  we  enumerate  Intrusion  Detection  Systems  (IDS) 
that  have  been  proposed  to  undertake  this  endeavor.  We  draw  upon  the  discussion  to  identify 
the  deficits  and  voids  in  current  research.  Finally,  we  offer  recommendations  and  future  research 
venues  based  upon  our  taxonomy  and  analysis  on  which  SCADA-specific  IDS  strategies  are  most 
likely  to  succeed,  in  part  through  presenting  a  prototype  of  our  efforts  towards  this  goal. 

Defined  by  IEEE  Standard  (C37. 1-1994)  [45]  ,  a  Supervisory  Control  and  Data  Acquisition 
(SCADA)  system  includes  all  control,  indication,  and  associated  telemetering  equipment  at  the 
master  station,  and  all  of  the  complementary  devices  at  the  (Remote  Terminal  Unit)  RTU(s)1.  A 
typical  SCADA  system  includes  hardware,  software  and  communication  protocols  that  connect 
together  the  different  layers  in  the  hierarchy.  For  more  detailed  exposition  of  SCADA  system 
compositions,  readers  please  refer  to  resources  such  as  [256,  153] 

Being  one  of  the  primary  categories  of  control  systems,  SCADA  systems  are  generally  used  for 
large,  geographically  dispersed  distribution  operations,  such  as  electrical  power  grids,  petroleum 
and  gas  pipelines,  water  and  wastewater  (sewage)  systems  and  other  critical  infrastructures  [256]. 
They  not  only  provide  management  with  remote  access  to  real-time  data  from  Distributed  Control 
Systems  (DCSs)  and  Programmable  Logic  Controllers  (PLCs)  but  also  enable  operational  con- 

1  R  I  L’s  are  special  purpose  data  acquisition  and  control  units  designed  to  support  SCADA  remote  stations.  These 
field  devices  are  often  equipped  with  wireless  radio  interfaces  to  support  remote  situations  where  wire  based  commu¬ 
nications  are  unavailable. 
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trol  center  to  issue  automated  or  operator-driven  supervisory  commands  to  remote  station  control 
devices. 

One  of  the  enabling  elements  in  SCADA  systems  is  the  set  of  various  communication  pro¬ 
tocols  employed  within  the  hierarchical  system  [12,  64,  153].  Their  functionalities  range  from 
processing  raw  data  transmission  to  handling  high-level  exchange  between  different  networks 
and  domains.  These  protocols  have  strong  implications  on  the  security  of  SCADA  system.  We 
name  a  few  most  popular  ones:  Modbus,  Profibus,  Distributed  Network  Protocol  (DNP3)  and 
Utility  Communications  Architecture  (UCA),  Foundation  Fieldbus,  Common  Industrial  Protocol 
(CIP),  Controller  Area  Network(CAN),  Object  Linking  and  Embedding  (OLE)  for  Process  Con¬ 
trol  (OPC)  and  Inter-Control  Center  Communications  Protocol  [153]. 

Most  industrial  plants  now  employ  networked  process  historian  servers  storing  process  data 
and  other  possible  business  and  process  interfaces,  such  as  using  remote  Windows  sessions  to 
DCSs  or  direct  file  transfer  from  PLCs  to  spreadsheets.  This  integration  of  SCADA  networks  with 
other  networks  has  made  SCADA  vulnerable  to  various  cyber  threats.  The  adoption  of  Ethernet 
and  TCP/IP  for  process  control  networks  and  wireless  technologies  such  as  IEEE  802.x,  Zigbee, 
Bluetooth,  WiFi,  plus  WirelessHART  and  ISA  SP100  [64,  153]  has  further  reduced  the  isolation 
of  SCADA  networks.  The  connectivity  and  de-isolation  of  the  SCADA  system  is  manifested  in 
Fig. 2.1. 

Furthermore,  the  recent  trend  in  standardization  of  software  and  hardware  used  in  SCADA 
systems  [153]  potentially  makes  it  even  easier  to  mount  SCADA-specific  attacks2.  These  attacks 
can  disrupt  and  damage  critical  infrastructural  operations,  contaminate  the  ecological  environ¬ 
ment,  cause  major  economic  losses  and,  even  more  dangerously,  claim  human  lives  [90,  5,  81]. 
These  likely  “penalty  costs”  due  to  lack  of  protection  and  aversion  to  loss  [138,  267,  242]  push  us 
to  consider  seeking  protection  measures  with  reasonable  cost-effectiveness  [196]. 

3.0.1  Why  SCADA-specific  Intrusion  Detection  Systems? 

Had  we  not  started  with  the  legacy  systems  but  been  freed  from  difficulties  such  as  interoper¬ 
ability  [161,  204]  instead,  we  may  apply  and  implement  many  known  security  measures  directly. 
Among  them,  a  sound  implementation  and  viable  deployment  of  one  Intrusion  Detection  System 
(IDS)  can  manifest  itself  as  an  add-on  intelligence  component  to  the  existing  SCADA  systems 
with  minimum  hardware  cost  or  operational  changes,  leveraging  many  entrenched  SCADA  com¬ 
ponent  infrastructures  and  technologies. 

To  this  end,  the  industrial  and  academic  control  security  community  has  started  to  build  In¬ 
trusion  Detection  Systems  (IDS)  specifically  for  SCADA  systems  ([49,  191,  195,  204,  230,  233, 
262,  263,  287]). 

Nevertheless,  it  is  important  to  realize  that  when  we  borrow  tools  from  other  fields,  there  are 
situations  and  conditions  that  our  original  set  of  assumptions  might  not  hold.  A  SCADA  system  is 
different  from  the  conventional  IT  system  in  the  following  ways  [256]:  it  is  a  hard  real-time  sys¬ 
tem;  its  timeliness  and  availability  at  all  times  is  very  critical  and  its  terminal  devices  have  limited 
computing  capabilities  and  memory  resources  [59].  Additionally,  in  the  existing  SCADA  systems, 
there  are  weak  authentication  mechanisms  to  differentiate  human  users  or  privilege  separation  or 

2  In  the  paper,  we  interchange  the  use  of  intrusion  and  attack  equivalently. 
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user  account  management  to  control  access  and  so  on  [204],  Such  fundamental  weakness  in  ac¬ 
cess  control  leaves  door  open  to  attacks.  These  differences  challenge  design  and  implementation 
of  SCADA-specific  IDSs. 

Meanwhile,  among  the  attempts  to  date,  some  authors  [49]  may  consider  that  SCADA  systems 
usually  have  a  relatively  static  topology1,  a  presumably  regular  network  traffic3 4  and  use  simple 
protocols,  hence  monitoring  them  may  not  be  more  difficult  than  doing  so  in  enterprise  systems. 
But  such  assumptions  are  not  fully  validated  yet  as  barely  any  mentioned  work  has  been  tested 
on  real  operational  SCADA  system  network  traffic.  The  related  details  are  to  be  discussed  in 
subsequent  sections. 

Furthermore,  the  cyber-physical  security  of  real-time,  continuous  systems  necessitates  a  com¬ 
prehensive  view  and  holistic  understanding  of  network  security,  control  theory  and  physical  sys¬ 
tems.  The  focus  and  terminologies  by  convention  in  each  field  have  partial  overlaps  and  their  own 
field-specific  interpretations  for  these  overlapped  lingoes.  One  of  the  barriers  faced  by  researchers 
in  IDS  for  SCADA  is  the  occupational  or  cultural  and  lingo  differences  between  IT  and  control 
personnel.  Thus  this  paper  aims  to  convey  the  idea  of  intrusion  detection  and  prevention  in  the 
setting  a  SCADA  system  by  leveraging  the  classic  control  engineering  and  theory  view  point. 

The  ultimate  goal  of  much  needed  work  in  this  area  is  to  achieve  satisfactory  control  per¬ 
formance  in  a  continuous  24  x  7,  real-time,  realistic  environment,  where  normalized  behavior 
co-exists  with  benign  noises,  honest  mistakes,  natural  components  and  or  systems  faults  plus  po¬ 
tential  malicious  cyber  intrusions. 

Towards  concrete  progress  beyond  generic  discussions,  it’s  important  for  us  to  survey  and 
evaluate  up-to-date  research  efforts  in  this  area  and  reflect  on  the  soundness  of  the  overall  method¬ 
ologies.  We  may  want  to  ask: 

•  Whether  these  techniques  and  approaches  have  addressed  the  specifical  needs  of  SCADA 
systems  ?  Furthermore, 

•  Whether  we  are  being  simply  handicapped  by  the  special  needs  of  current  SCADA  systems 
in  terms  of  security  engineering  efforts?  Or 

•  Whether  we  are  leveraging  the  entrenched  SCADA  infrastructure  components  and  technolo¬ 
gies? 

3.0.2  Contribution 

In  this  paper,  we  make  the  following  contributions: 

•  First  systematic  and  thorough  effort  in  investigating  and  assessing  the  landscape  of  up-to- 
date  SCADA-specific  intrusion  detection  techniques  and  systems; 

•  Explain  the  nuance  of  SCADA-specific  IDS  and  provide  clear  definitions  plus  a  taxonomy 
and  a  set  of  metrics  of  SCADA-specific  IDS; 

3Under  the  assumption  that  there  is  no  wireless  sensor  network  involved. 

4Due  to  the  scarce  accessibility  to  operational  SCADA  traces  known  to  the  public,  we  are  conservative  at  taking 
the  leap  of  faith  yet. 
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•  Ease  the  interoperability  between  conventional  IT  security  and  control  systems  research  by 
framing  the  intrusion  detection  problem  in  a  setting  favorable  to  SCADA  systems’  continu¬ 
ous  operation,  withstanding  the  possible  presence  of  adversary  and  unintentional  faults; 

•  Bring  in  cross-discipline  insights  to  tailor  the  special  needs  entailed  by  SCADA  systems  by 
leveraging  entrenched  SCADA  components  and  technologies  and  provide  future  direction; 

•  Show  a  prototype  of  our  efforts  in  this  arena. 

3.0.3  Definitions  and  Difficulties  from  Ambiguities 

To  resolve  the  ambiguity  of  same  terminologies  that  bear  different  meanings  in  control  the¬ 
ory  (including  systems  &  control  and  fault  detection  &  isolation)  and  IT  (particularly,  operating 
system  and  security  engineering),  we  intend  to  unify  the  terms  to  ease  the  misunderstanding  and 
highlight  the  end  goal  of  providing  engineers  and  researchers  insights  into  the  problems  facing 
networked  control  systems  [304], 

Fault:  a  non-hostility-induced  deviation  from  the  system’s  specified  behavior  including  honest 
mistakes  caused  by  honest  people  and  component  failures  or  defects. 

Anomaly:  refers  to  malicious  and  intrusive  event  plus  abnormal  yet  non-intrusive  behavior 
including  (faulty  and  noisy/messy)  actions; 

Misuse:  includes  both  malicious  and  unintentional  misuse; 

Detection:  alarm  alerts  issued  in  the  presence  of  true  anomaly  or  misuse. 

False  alarm/positive:  alarm  alerts  issued  in  the  absence  of  real  anomaly  and/or  misuse  when 
there  is  normal  traffic/behavior  only.. 

False  negative  or  missed  detection:  missed  detection  in  the  presence  of  a  real  intrusion. 

Note:  Any  large  network  is  a  very  “noisy”  environment  even  at  the  packet  level. 

3.0.4  Related  Work 

Since  SCADA-specific  IDS  research  is  a  rather  new  arena,  we  decide  to  resort  to  the  classics 
in  the  standard  IT  field  for  references. 

As  observed  by  John  McHugh  in  [176] 

The  point  is  that  the  taxonomy  must  be  constructed  with  two  objectives  in  mind: 
describing  the  relevant  universe  and  applying  the  description  to  gain  insight  into  the 
problem  at  hand. 

Both  Stefan  Axelsson  [15]  and  John  Mchugh  [177]  have  thorough  work  on  classification  of 
intrusion  detection  systems.  Many  evaluation  and  assessment  principles  on  SCADA-specific  IDS 
in  this  paper  are  derived  from  their  works. 

The  unified  view  is  to  consider  intrusion  detection  as  a  signal  detection  problem  as  framed  by 
Stefan  Axelsson  [16],  where  we  consider  the  normal  network  traffic  as  background  data.  If  we 
view  background  data  and  responses  as  noise  and  attack  data  and  responses  as  signal,  the  IDS 
problem  can  be  characterized  as  one  of  detecting  a  signal  in  the  presence  of  noise.  This  school  of 
thought  is  much  in  line  with  the  standard  control  theory  [46]. 
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3.1  On  Real  Time  Intrusion  Detection  Types 

We  adapt  a  taxonomy  of  real-time  intrusion  detection  to  facilitate  the  choice  for  control’s 
researchers  as  well. 

In  the  early  days  of  IDS  research,  two  major  approaches  known  as  signature  detection  and 
anomaly  detection  were  developed. 

In  between  these  two  approaches,  there  lie  the  probabilistic-  and  specification-based  methods 
for  intrusion  detection.  A  probabilistic  approach  is  also  termed  as  a  statistical  or  a  Bayes  method 
[152]  with  probabilistically  encoded  models  of  misuse.  It  has  some  potential  to  detect  unknown 
attacks.  A  specification-based  approach  constructs  a  model  of  what  is  allowed,  enforces  its 
predefined  policy  and  raises  alerts  when  the  observed  behavior  is  outside  this  model.  It  has  a 
high  potential  for  generalization  and  leverages  against  new  attacks  [20].  This  technique  has  been 
proposed  as  a  promising  alternative  that  combines  the  strengths  of  signature -based  and  anomaly- 
based  detection. 

Instead  of  finding  the  deviation  and  unknowns,  specification-based  method  [20,  148]  defines 
what’s  allowable  in  terms  of  network  traffic  behavior/patterns.  This  method  sounds  promising. 
But  it  might  be  tedious  to  enumerate  all  possibly  allowable  patterns. 

Complementary  to  the  above  knowledge  based  classification,  there  are  also  behavioral  detec¬ 
tion  approaches5.  They  capture  behavior  patterns  associated  with  certain  attacks  which  are  not 
necessarily  illegitimate  in  semantic  sense.  They  may  also  abstract  allowable  normal  interaction  as 
well.  Such  methods  are  quite  promising,  especially  in  conjunction  with  other  methods  [290]. 

Table  3.1  gives  the  overall  comparison. 


Knowledge  based 
or  behavioral  based 

Approach 

Basis 

Attacks  Detected 

Generalization 

Knowledge 

Signature 

Misuse 

Known 

No 

Knowledge 

Anomaly 

Learned  models  of  normal 

Must  appear  anomalous 

Yes 

Knowledge 

Probabilistic 

Model  learning 

Match  patterns  of  misuse 

Some 

Hybrid 

Specification 

Construct  normal  model 

Must  violate  specs 

Yes 

Behavioral 

Behavioral 

Capture  behavioral  pattern 

Match  patters  of  behavior 

Yes 

Table  3.1:  Comparison  of  Intrusion  Detection  System  Approaches 


5  A  thoroughly  stringent  and  meticulous  categorization  is  not  the  focus  of  this  paper.  Interested  readers  may  refer 
to  [15,  177]  for  more  detailed  taxonomies  on  IDS 
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3.2  Proposed  SC  ADA-specific  Intrusion  Detection/Prevention 

Systems 

3.2.1  Model-Based  IDS  for  SCADA  Using  Modbus/TCP 

The  group  at  SRI  [49]  adapted  the  specification-based  approach  for  intrusion  detection  to 
SCADA  systems  that  rely  on  Modbus/TCR  This  work  renders  a  multi-algorithm  IDS  appliance 
containing  pattern  anomaly  recognition,  Bayes  analysis  of  TCP  headers,  and  stateful  protocol 
monitoring  complemented  with  customed  Snort  rules.  Alerts  are  forwarded  to  the  correlation 
framework. 

They  offer  three  model-based  techniques  to  characterize  the  expected/acceptable  system  be¬ 
havior  according  to  the  Modbus/TCP  specification  and  to  detect  potential  attacks  that  violate  these 
models. 

3.2.2  Anomaly-Based  Intrusion  Detection 

We  discuss  two  anomaly-based  intrusion  detection  systems  in  this  section. 

AutoAssociative  Kernel  Regression  and  Statistical  Probability  Ratio  test  SPRT 

Yang  et  al  [287]  use  the  AutoAssociative  Kernel  Regression  (AAKR)  model  coupled  with  the 
Statistical  Probability  Ratio  test  (SPRT)  and  apply  them  to  a  simulated  SCADA  system. 

The  fundamental  methodology  is  pattern  matching.  Predetermined  features  representing  net¬ 
work  traffic  and  hardware  operating  statistics  are  used  by  the  AAKR  model  to  predict  the  “correct” 
behavior.  Then  new  observations  are  compared  with  past  observations  denoted  as  normal  behav¬ 
ior.  The  comparison  residuals  are  fed  into  SPRT  to  determine  whether  is  anomalous  or  not. 

Besides  DoS  attacks,  ping  flood,  jolt2  attacks,  bubonic  attacks,  simultaneous  jolt2  and  bubonic 
attacks,  the  authors  also  consider  insider  attack  scenarios. 

Multi-Agent  IDS  Using  Ant  Clustering  Approach  and  Unsupervised  Feature  Extraction 

Tsang  and  Kwong  [262]  propose  an  unsupervised  anomaly-learning  model  -  the  Ant  Colony 
Clustering  Model  (ACCM)  in  a  multi-agent,  decentralized  IDS  to  reduce  data  dimensionality  and 
increase  modeling  accuracy.  The  idea  is  bio-inspired  from  nature  to  construct  statistical  patterns 
of  network  data  into  near-optimal  clusters  for  classification. 

3.2.3  Configurable  Middleware-Level  Detection 

Naess  et  al  [195]  presents  a  configurable  Embedded  Middleware-level  Intrusion  Detection  Sys¬ 
tem  (EMISDS)  framework.  It’s  implemented  within  MicroQoSCORBA,  a  CORBA-based  middle¬ 
ware  framework,  with  high  configurability  achieved  with  the  Interface  Definition  Language  (IDL) 
compiler  and  code  generation  tools  [178]. 

The  system  model  is  comprised  of  anomaly  and  misuse  detection  while  leaving  the  flexibility 
to  specify  the  interaction  of  middle-level  information  within  the  IDS. 
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3.2.4  Intrusion  Detection  and  Event  Monitoring  in  SCADA  Networks 

Oman  and  Phillips  [204]  from  the  University  of  Idaho  give  a  very  clear  exposition  on  the  im¬ 
plementation  of  a  SCADA  power-grid  testbed  for  intrusion  detection  and  event  monitoring.  They 
are  producing  comprehensive  intrusion  signatures  for  unauthorized  access  to  SCADA  devices  be¬ 
sides  baseline-setting  files  for  those  devices. 

3.2.5  Model  for  Cyber-Physical  Interaction 

Power  Plant  interfacing  Substations  through  Probabilistic  validation  of  attack-effect  bind¬ 
ings  (PVAEB) 

Rrushi  and  Campbell  [233]  look  into  the  attacks  on  IEC  61850  [126],  the  protocol  used  for 
communication  between  electricity  substation  and  power  plant  (a  nuclear  power  plant  is  referred). 

The  authors  present  the  semantic  correlation  between  the  dynamics  of  nuclear  reactors  in  the 
power  plant  and  those  of  the  generated  electricity  provision  in  the  substation  through  structural 
equations  modeling  (SEM).  For  each  logical  node  of  IEC  61850,  they  apply  Bayesian  Belief  Net¬ 
works  (BBN)  to  enumerate  probability  distributions  attributed  by  its  associated  data  individually. 
Then  the  authors  use  Stochastic  Activity  Network  (SAN)  to  verify  such  bindings  and  to  spot  in¬ 
trusions. 

All  construction  of  attack-effects  are  based  on  known  failure  models. 

Workflow-based  non-intrusive  approach  for  enhancing  the  survivability  of  critical  infras¬ 
tructures  in  Cyber  Environment 

Xiao  et  al  [282]  proposed  an  approach  based  on  workflow,  a  technique  to  automate  existing 
processes  to  incorporate  the  detections  of  both  known  attack  patterns  and  known  unsafe  states. 

This  work  leverages  the  presumably  existing  survivability-related  knowledge  and  protection 
scheme.  They  consider  that  each  essential  component  in  the  physical  layer  has  a  corresponding 
node  in  the  workflow. 

A  simplified  water  treatment  system  is  studied  through  simulation  to  illustrate  the  idea. 


3.3  Comparison 

The  overall  comparisons  of  the  proposed  systems  are  listed  in  Table  3.2  and  Table  3.3.  The 
rationale  behind  choosing  the  features  we  used  for  comparison  is  out  of  operational  concerns 
besides  performance  issues. 

3.3.1  Intrusion  Detection 

Particularly,  we’d  like  to  look  into  the  intrusion  detection  methods  used  in  each  system,  seen 
in  Table  3.4 
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Table  3.2:  Comparison  of  Intrusion  Detection  System  Approaches 
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Real 

traces 

testbed 

N/A 

testbed 

simulation 

N/A 

testbed 

testbed 

simulation 

w/o  intrusion 

KDD-cup 

Deploy. 

ment 

no 

N/A 

no 

no 

N/A 

no 

no 

no 

no 

Imple¬ 

ment. 

yes 

yes 

yes 

yes 

no 

yes 

yes 

no 

yes 

Inter- 

oper. 

N/A 

yes 

yes 

maybe 

yes 

yes 

yes 

N/A 

N/A 

Type  of 
Response 

passive 

passive 

active 

passive 

active 

passive 

passive 

N/A 

active 

Audit 

Source 

host 

network 

both 

network 

network 

network 

host 

both 

both 

Granul¬ 

arity 

batch 

cont. 

cont. 

cont. 

cont. 

cont. 

cont. 

batch. 

N/A 

Scalab¬ 

ility 

medium 

.3? 

23 

.3? 

23 

low 

low 

low 

high 

high 

Data 

Coll. 

centr. 

</3 

"3 

C/3 
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</3 
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centr. 

centr. 

centr. 

i/3 
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C/3 
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Data 

Proc. 

centr. 

centr. 

•t— > 

C/3 
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centr. 

centr. 

centr. 

centr. 

•t— > 

C/3 
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•t— > 

C/3 
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Name  of 
System 

PVAEB  [233] 

IBM  NADS  [191] 

SRI  Modbus  [49] 

WFBNI  [282] 

SHARP  [230] 

IDEM  [204] 

AAKRSPRT[287] 

EMISDS  [195] 

MAACUFE  [262] 
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3.3.2  SCADA-Specific-ness 

We  compare  how  SCADA’s  special  needs  are  being  addressed  in  each  proposed  system  with 
results  shown  in  Table  3.5 


3.4  Evaluation 

3.4.1  Design  Pitfalls  and  Evaluation  Criteria 

Looking  at  IT  standard  IDSs,  McHugh  [176]  criticizes  many  aspects  of  the  DARPA/LL  eval¬ 
uation.  In  terms  of  modeling,  both  signature  and  probabilistic  IDSs  model  misuse,  the  illegal 
behavior  of  an  intrusion.  Anomaly-based  IDSs  empirically  and  statistically  model  normal  system 
usage  and  behavior.  Specification-based  IDSs  define  what  is  allowable  under  protocol  and  policy 
specification.  All  these  model-based  approaches  bear  certain  common  drawbacks: 

•  Inaccurate  models  can  lead  to  false  alarms  and/or  missed  detections. 

•  Modeling  can  be  expensive  and  difficult  if  the  system  and/or  user  activity  is  complex. 

Anderson  states  [12]  “In  general,  if  you  build  an  intrusion  detection  system  based  on  data- 
mining  techniques,  you  are  at  serious  risk  of  discriminating.” 

Paxson  has  a  similar  argument,  even  more  from  a  technical  point  of  view  [208]  that  one  of  the 
pitfalls  of  machining  learning  based  IDS  techniques  is  the  lack  of  illumination  for  the  rationale 
behind  many  approaches  on  how  they  decide  to  take  such  approach;  and  why  they  succeed  in 
doing  so  or  why  they  fail  in  achieving. 

According  to  Axelsson  [15],  McHugh  [177]  and  Paxson  [208],  we  shall  look  for 

•  soundness 

•  completeness 

•  timeliness 

•  choice  of  metrics,  statistical  models,  profiles 

•  system  design; 

•  social  implications 

•  feedback:  or  how  to  decide  actionable  events 

The  SCADA-specific  angles  we  look  at  are:  What  are  their  contributions,  limitations  or  room 
for  improvement,  extensibleness  in  terms  of 

•  How  do  they  frame  the  work  including  assumptions,  logics  and  conclusions? 

•  What  kind  of  security  properties  do  they  want  to  achieve?  Do  they  achieve  and  how? 
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Interaction 

between 
Cyber  - 
Physical 

yes 

yes 

Use  of  SCADA  Components 

communication 

protocol 

IEC  61850 
DNP3 

Modbus 

Modbus 

SNMP 

hardware 

simulated 

IED 

SW 

yes 

HW 

yes 

Domain/ 

Industry 

electrical 

power 

N/A 

water 

N/A 

electrical 

power 

N/A 

N/A 

N/A 

Inter. 

oppp 

N/A 

yes 

yes 

N/A 

yes 

yes 

yes 

N/A 

N/A 

Security  Properties 

Availability 

Type 

Response 

passive 

passive 

passive 

passive 

active 

passive 

passive 

passive 

active 

Self 

Security 

low 

low 

medium 

low 

high 

low 

low 

low 

N/A 

Time- 

-liness 

yes 

Name  of 
System 

PVAEB  [233] 

IBM 

NADS  [191] 

SRI  Modbus 
[49] 

WFBNI  [282] 

SHARP 

[230] 

IDEM 

[204] 

AAKRSPRT 

[287] 

EMISDS 

[195] 

MAACUFE 

[262] 

Table  3.5:  Comparison  of  SC  ADA’s  Special  Needs  Being  Addressed  in  Each  Proposed  System 
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•  What  are  their  trust  model,  threat  model  and  attack  scenarios?  How  plausible? 

•  What  are  the  illuminations  they  bring  into  the  problem  space; 

•  What’s  the  selling  point  of  their  approach? 

•  What  kind  of  detection  algorithms  they’ve  used  that  suit  SCADA  systems  particularly  well 

1.  either  through  leveraging  the  entrenched  components  and/or  technologies  used  in  the 
specific  SCADA  physical  systems  under  their  study; 

2.  or  restrict  their  attention  to  a  more  focused  and  potentially  narrowed  workspace  that 
are  more  relevant  to  specific  SCADA  physical  system  under  their  study  when  applying 
generic  methods. 

•  What  are  the  subtle  points  they  bring  out  that  might  have  been  simply  left  out  by  a  non- 
SCADA-security  expert? 

•  What’s  unique  in  the  cyber-physical  interactions? 

•  How  is  the  detection  performance  in  terms  effectiveness  and  efficiency?  Effectiveness  is 
reflected  through  high  detection  rate  and  low  false  alarm  rate;  efficiency  overheads. 

3.4.2  Evaluation  Results 

Strength 

Intrusion  detection  research  for  SCADA  systems  to  date  has  been  quite  limited,  with  the  three 
most  prominent  and  critical  deficiencies  being 

•  the  lack  of  a  well-considered  threat  model; 

•  the  absence  of  addressing  false  alarm  and  false  negative  (mis-detection)  rates;  and 

•  the  need  to  empirically  ground  the  development  of  IDS  mechanisms  in  the  realities  of  how 
such  systems  operate  in  practice,  including  the  diversity  of  traffic  they  manifest  and  the  need 
to  tailor  IDS  operation  to  different  SCADA  environments. 

From  the  above  evaluation  of  existing  IDSs  for  SCADA  systems,  we  can  see  that  the  current 
bottleneck  problems  faced  by  research  and  design  henceforth  implementation  and  deployment  of 
IDS  for  SCADA  are  the  scarcity  in  access  to  operational  SCADA  system  (network  traffic)  traces 
and  the  lack  of  prudent  yet  novel  threat  models,  or  attack  scenarios. 

Barely  any  of  these  systems  has  a  performance  evaluation  on  the  false  alarms  that  it  generates. 
However,  given  the  availability  demand  of  SCADA  systems,  we  believe  this  is  an  issue  that  must 
be  addressed  well  before  IDS  can  be  implemented  and  deployed  in  SCADA  systems  at  large  scale. 
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3.5  Future  Directions 

Ultimately,  any  viable  technical  solutions  and  research  directions  in  securing  SCADA  systems 
must  lie  in  the  conjunction  of  computer  security,  communication  network  and  control  engineering. 
However,  the  very  large  installed  base  of  such  systems  means  that  in  many  instances  we  must  for  a 
long  time  to  come  rely  on  retrofitted  security  mechanisms,  rather  than  having  the  option  to  design 
them  in  from  scratch.  This  leads  to  a  pressing  need  for  deployable,  robust,  SCADA-specific 
intrusion  detection  systems  (IDS). 

We  shall  aim  to  capture  the  characteristics  of  a  specific  SCADA  system  under  study  with  full 
situational  awareness,  including  the  dynamics  of  the  physical  plant  being  monitored,  its  com¬ 
munication  patterns,  system  architecture,  network  traffic  behavior,  and  specific  application-level 
protocols  used. 

3.5.1  Our  Future  Work 

We  propose  a  JIE  6,  a  viable  intrusion  detection  and  self-hardening  system  for  SCADA  system. 


In  terms  of  the  functionalities  of  intrusion  detection  and  prevention,  our  proposed  JIE  would 
be  able  to 

•  efficiently  detect  and  block  cyber  intrusions  into  SCADA  systems  in  real  operational  envi¬ 
ronments,  and  in  real-time, 

•  without  interrupting  the  control  performance  of  the  protected  system, 

•  without  creating  extra  operational  burdens  or  operational  reservations  due  to  false  alarms, 

•  in  the  presence  of  both  malicious  and  messily  benign  network  traffic.  The  system  must 
operate  in  a  real-time,  robust  fashion,  with  performance  adequate  to  meet  the  demands  of 
the  dynamic  cyber-physical  interactions  inherent  to  SCADA  systems. 


3.6  Discussion 

As  argued  by  Rakaczky  [224],  the  ease  of  deployment  requires  the  intrusion  detection/prevention 
strategy  to  minimize  the  associated  personnel  overhead. 

The  model-based  system  for  SCADA  system  using  Modubs/TCP  addresses  Modbus  protocol 
encapsulated  within  TCP/IP.  The  idea  can  be  generalized  to  other  control  system  protocols  as  well. 

Since  SCADA  networks  are  built  of  resource-constrained  embedded  systems,  the  IDS  using 
the  middleware-level  detection  has  the  advantage  of  directly  accessing  message  signatures  and 

6This  is  the  40th  hexagram  of  I  Ching,  or,  Yi  Jing,  The  Book  of  Changes,  comprising  of  64  hexagrams  plus  their 
commentaries  and  transformations  as  strategic  interpretation  of  chance  event.  It  literally  means  Problem  Solving  or 
Deliverance.  The  essence  of  this  strategy  is:  Don’t  trouble  troubles  until  trouble  troubles  you;  If  it  does,  then  act 
quick. 
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parameter  values  without  decoding  the  raw  network  packets.  But  there  is  a  tradeoff  in  the  risk 
involved  in  handling  embedded  responses  to  attacks. 

Both  model-based  intrusion  detection  and  middleware-level  intrusion  detection  build  models 
to  specify  the  normal  behavior  of  the  network  traffic  and  compare  the  SCADA  traffic  against  these 
models  to  detect  potential  anomalous  behavior.  Model-based  detection  is  an  important  comple¬ 
ment  to  signature-based  approaches. 

The  specification-based  IDS  has  an  inviting  advantage  to  SCADA  systems  and  networked 
control  systems  in  general. 
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Chapter  4 

Xware  -  an  Overall  Architecture  of  a 
SCADA-specific  Security  Solution 


Security  is  a  process,  not  a  product. 

Bruce  Schneier 

A  SCADA-specific  defense-in-depth  security  engineering  solution  framework:  Xware  as  shown 
in  figure.  4.1  is  presented  in  this  chapter. 

This  system  tailors  the  special  needs  entailed  by  SCADA  systems  through  leveraging  the  en¬ 
trenched  SC  ADA  components  and  technologies.  It  provides  reliable  performance  in  the  face  of 
malicious  intrusion,  unintentional  faults,  honest  mistakes,  benign  noise,  extreme  cases  besides 
predefined  allowable  behavior  thus  very  low  in  both  false  positive  and  false  negative  rates.  We 
give  an  overview  of  the  system’s  design  with  emphasis  on  prudent  threat  model.  Xware  is  com¬ 
prised  of  two  strong  footings  -  Normalcy  Checking ,  a  control  theoretic,  domain  knowledge  spe¬ 
cific,  specification-based  payload  inspection  system  and  a  high-speed,  real-time,  behavioral-based 
NIDS  (Network  Intrusion  Detection  System).  Xware  integrates  a  Trust  Counter  to  verify  the 
truthfulness  of  sensor  measurements.  It  also  provides  exfiltration  of  confidential  information  from 
within  the  intranet.  Moreover,  Xware  hardens  SCADA  system  with  compensation  schemes  when 
intrusion  evades  NIDS  or  unexpected  fault  occurs  to  guarantee  its  performance.  It  puts  things  in 
perceptive  and  highlights  the  overall  systematic  and  holistic  approach. 
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Figure  4.1:  Xware :  the  overal  architecture  of  a  SCADA-specific  Security  Solution 
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Chapter  5 

Trust  Counter  -Data  Fusion  Assurance  for 
the  Kalman  Filter  in  Uncertain  Networks 


Trust  is  cheaper  than  control 

Jon  Mell 

This  chapter  depicts  Trust  Counter,  an  important  component  of  the  proposed  Xware  that  mea¬ 
sures  trustworthiness  of  each  sensor  reading  before  fusing  them  in  an  estimation-performance¬ 
centric  way  and  feeding  it  to  a  central  location. 

Due  to  standardization  and  connectivity  to  other  networks,  networked  control  systems,  a  vital 
component  of  many  nations’  critical  infrastructures,  face  potential  disruption.  Its  possible  mani¬ 
festation  can  affect  the  Kalman  filter,  the  primary  recursive  estimation  method  used  in  the  control 
engineering  field.  Whereas,  to  improve  such  estimation,  data  fusion  may  take  place  at  a  cen¬ 
tral  location  to  fuse  and  process  multiple  sensor  measurements  delivered  over  the  network.  In 
an  uncertain  networked  control  system  where  the  nodes  and  links  are  subject  to  attacks,  false  or 
compromised  or  missing  individual  readings  can  produce  skewed  results.  To  assure  the  validity 
of  data  fusion,  this  paper  proposes  a  centralized  trust  rating  system  that  evaluates  the  trustworthi¬ 
ness  of  each  sensor  reading  on  top  of  the  fusion  mechanism.  The  ratings  are  represented  by  Beta 
distribution,  the  conjugate  prior  of  the  binomial  distribution  and  its  posterior.  Then  an  illustrative 
example  demonstrates  its  efficiency. 

Control  systems1  are  deeply  ingrained  in  the  fabric  of  critical  infrastructure  sectors  including 
power  grids;  oil  and  gas  pipeline  systems;  water  treatment  and  distribution;  railroads  and  mass 
transit;  and  widely  involved  in  the  constitutions  of  vital  enterprises  such  as  manufacturing  plants 
and  building  climate  control  [79]. 

Most  industrial  plants  now  employ  networked  process  historian  servers  storing  process  data 
plus  other  possible  business  and  process  interfaces2 .  This  integration  of  networked  control  systems 
with  other  networks  has  made  control  systems  vulnerable  to  various  cyber  threats.  The  adoption  of 

1  Control  Systems  are  computer-based  systems  that  are  used  in  many  industries  to  monitor  and  control  sensitive 
processes  and  physical  functions  [79]. 

2For  example,  using  remote  Windows  sessions  to  Distributed  Control  Systems  or  direct  file  transfer  from  Program 
Logic  Controllers  to  spreadsheets. 
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Ethernet  and  TCP/IP  for  process  control  networks  and  wireless  technologies  such  as  IEEE  802.x, 
Zigbee,  Bluetooth,  WiFi  [64,  153]  and  so  on  has  further  reduced  the  isolation  of  control  networks. 
The  connectivity  and  de-isolation  of  a  control  system  is  manifested  in  Fig.??.  Furthermore,  the 
recent  trend  in  standardization  of  software  and  hardware  used  in  control  systems  makes  it  possible 
to  mount  control  specific  attacks.  The  continuous  availability,  hard  deadline,  legacy  issues  and 
low  computation  power  of  the  end  devices  are  among  the  things  that  have  been  keeping  ready 
security  measures  from  immediate  implementation  and  deployment. 

Such  uncertainty  may  potentially  affect  the  performance  of  networked  control  systems.  Specif¬ 
ically,  we  address  its  likely  manifested  impact  on  the  Kalman  filter  based  estimation,  a  key  func¬ 
tionality  of  control  systems,  and  propose  a  possible  countermeasure. 

Typically,  a  central  location  collects  measurements  from  multiple  sensors  to  achieve  higher 
accuracy  in  estimation  as  shown  in  Fig  5.1. 


Figure  5.1:  An  Example  of  Centralized  Data  Fusion  for  Networked  Control  Systems 

The  discrete  time  linear  dynamical  system  and  measurement  model  are  the  following,  where  i 
is  the  index  of  sensors. 


xt+\  =  Axt  +  Wt  (5.1) 

yi,t  =  Qxt  +  Vij  (5.2) 

where  xt  e  91n  is  the  state  vector,  yt  e  91 111  is  the  output  vector,  w,  e  91p  is  white  Gaussian  noise 
with  zero  mean  and  covariance  Q  >  0  and  v;/s  e  91 111  are  white  Gaussian  noises  with  covariance 
Ri  >  0.  wt  and  vu  \s  are  independent.  The  initial  system  state  xq  is  Gaussian  with  zero  mean  and 
covariance  £0.  We  assume  xq  is  independent  of  w,  and  v,-/s. 

Then  individual  measurements  y,  f  undergo  fusion  before  feeding  into  the  Kalman  filter,  which 
will  be  further  discussed  in  later  sections. 

Furthermore,  we  shall  briefly  recap  the  standard  Kalman  filtering  algorithm  and  the  Kalman 
filter  based  data  fusion  methods  in  a  theoretically  benign  setting  plus  mention  two  well  known 
examples  of  trust  rating  systems  in  practice,  dealing  with  potential  malicious  situations. 
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5.0.1  Standard  Kalman  Filter 


Pt\t  =  E[(xr-%)(xr-%)'|yt] 

xt+1\t  =  E[*,+  i|yt] 

Pt+i\t  =  E[(^+i-if+1|f)(x?+i-if+i|f)'|yt] 
yt+i\t  =  E[yf+1|yt], 

The  prediction  phase  for  xt+\\t  and  Pt+i\t  of  the  Kalman  filter  is  independent  of  the  observation 
process  with: 

xt+i\t  =  Axt \t  (5.3) 

Pt+i\t  =  APt[tA+Q  (5.4) 

For  the  update  phase  of  the  Kalman  filter,  we  have 


•*r+llf+l 

—  *t+i\t  +  Pt+l\tC  (CPt+i\tC  +R)  1 

(yt+i-Cxt+llt) 

(5.5) 

^  1 1 1 -\- 1 

=  APtA  +  Q  -  Pt+l\tC  (CPt+i\tC  +R)~l 

CPt+\  ? 

(5.6) 

The  accuracy  of  measurement  improves  as  more  sensors  collaborate.  Naturally,  this  leads  to 
the  question  of  how  to  fuse  data  from  multiple  sensors. 

5.0.2  Data  Fusion 

The  two  most  commonly  used  methods  for  the  Kalman  filter  based  data  fusion  are  state- vector 
fusion  and  measurement  fusion  [76].  State-vector  fusion  involves  fusing  a  joint  state  estimate 
through  individual  estimates  produced  by  each  sensor  from  its  individual  Kalman  filter,  whereas 
the  measurement  fusion  method  directly  fuses  the  sensor  measurements  to  obtain  a  weighted  mea¬ 
surement  and  feeds  it  into  a  single  Kalman  filter  to  derive  a  final  state  estimate. 

The  measurement  fusion  method  provides  a  better  overall  estimation  performance  and  de¬ 
mands  a  relative  lower  computation  load  on  each  sensor  node.  The  state-vector  fusion  method  is 
only  effective  when  the  Kalman  filters  are  consistent  [76],  whereas  modeling  errors  introduced  by 
linearization  in  many  realistic  applications  often  violate  this  condition.  For  this  reason,  we  focus 
our  attention  on  measurement  fusion  to  illustrate  the  idea. 

Note  so  far  we  only  discuss  things  in  a  benign  setting  whereas  in  reality  there  are  many  mali¬ 
cious  situations.  To  motivate  our  problem  formulation  and  proposed  solution,  we  name  two  of  the 
well-known  examples  in  practice  that  handle  such  uncertainty. 
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5.0.3  Trust  Rating  Systems 

Google  uses  robots  to  crawl  the  web  pages  and  then  to  store  their  information  into  their 
database  to  calculate  the  pagerank  value.  Therefore,  Google  is  characterized  as  a  centralized 
reputation  system  [286]. 

Netscape  8  includes  a  new  “Trust  Rating”  system  that  attempts  to  tell  users  which  sites  are 
“safe”.  Netscape  shows  an  on-screen  indication  when  it  believes  a  site  to  be  trustworthy  [65]. 

Each  system  includes  a  component,  or  trust  counter,  to  compute  and  store  related  trustwor¬ 
thiness  information. 

Paper  Organization 

After  motivating  the  problem,  section  5.1  gives  the  problem  formulation  including  the  fusion 
method,  trust  and  threat  model  and  the  overall  assurance  idea;  section  5.2  explains  the  details  of 
how  the  trust  rating  system  works  with  section  5.3  showing  a  simple  illustrative  example. 


5.1  Problem  Formulation 

Among  several  possible  methods  for  measurement  fusion,  we  choose  to  fuse  observations  from 
different  sensors  with  the  inverse  of  the  sensor’s  variance  as  weighting  factor. 

»  =  [Efir‘(or1I>rl(')»,r  a?) 

i=l  i=  1 

This  method  is  optimal  in  the  sense  of  minimum-mean-square-error  (MMSE)  with  a  consistent 
observation  vector  dimension  to  have  a  lower  computational  load.  Note  the  noise  covariance  of 
fused  measurement  takes  the  formR?  =  (Xf-i  R]  1  (0]  1  •  We  name  this  functionality  as  fuser. 

Before  moving  on  to  the  details  of  assurance  system,  it’s  necessary  to  outline  the  trust  and 
threat  model. 

5.1.1  Trust  Model 

We  assume  the  central  location,  where  the  fuser  and  trust  counter  reside,  is  secure  \ 

5.1.2  Threat  Model 

We  assume  that  the  nodes  and  links  are  in  an  uncertain  environment,  which  is  subject  to  attacks 
from  the  outside  world.  Attacks  can  affect  the  integrity  and  availability  of  the  data,  such  as  the 
man-in-the-middle  attack,  that  may  change  or  delete  the  data  content.  Or  by  taking  down  certain 
links,  the  absence  of  data  from  certain  nodes  may  be  mistreated  as  readings  being  zero. 

3By  resorting  to  central  processing,  we  restrain  ourselves  from  potential  attacks  such  as  bad  mouthing  in  dis¬ 
tributed  systems. 
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5.1.3  Assurance 

Facing  these  potential  threats,  we  add  a  trust  rating  system  (Fig  5.2)  with  details  in  section  5.2. 


Figure  5.2:  The  Architecture  for  Fusion  Assurance 

The  architecture  adds  a  trust  counter  that  maintains  the  trustworthiness  and  untrustworthiness 
values  of  each  nodes,  on  top  of  the  original  fusion  mechanism,  seen  in  Fig  5.2. 

5.2  Trust  Rating  System 

a,  and  [3,  represent  the  corresponding  ratings  for  node,  and  are  determined  by  equation  5.8. 
These  two  values  range  with  (0,1)  and  depend  on  the  offset  contributed  by  the  variation  of  the 
existing  overall  median  upon  the  introduction  of  the  reading  from  this  particular  node.  If  the 
new  median  is  off  beyond  a  preset  threshold  value,  namely  |m,  —  m\  >  Threshold ,  the  node  has 
untrustworthiness  of  1  and  trustworthiness  0.  Or  if  its  reading  doesn’t  introduce  notable  difference 
from  the  existing  median,  then  the  node  has  trustworthiness  1  and  untrustworthiness  0.  Otherwise, 
if  the  resulted  change  is  within  the  threshold,  |m,  —  in  <  Threshold ,  then  its  trustworthiness  is 

proportional  to  the  change  it  introduced  versus  the  threshold  value  Th’e threshold  ~  •  It’s  worth 
pointing  out  that  the  median  of  all  measurements  y,  is  a  robust  metric  to  quantify  the  individual 
measurement  [118]. 
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In  fact,  the  trust  ratings  are  represented  by  Beta  distribution  [135]  with  a  and  |3  as  its  parame¬ 
ters. 


Beta{  a,  (3) 


*<  l,a  >  0,(3  >  0. 


(5.9) 


The  central  counter  updates  the  trust  ratings  of  node,  based  on  r,  truthful  and  s-,  bogus  obser¬ 
vations.  Given  that  the  two  sets  of  observations  are  binary,  i.e.,  truthful  or  not  and  bogus  or  not, 
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they  follow  Binomial  distribution.  Indeed,  the  Beta  distribution  is  the  conjugate  prior  of  the  Bino¬ 
mial  distribution  and  its  posterior  as  well.  By  using  a  Bayesian  parameter  estimation  of  binomial 
distribution,  it  follows  that 


Bin(n  +  Si,  rj)  *  Beta( cq,  (3,) 
Normalization 


Beta(aj  +  ri,$j  +  Sj ) 


(5.10) 


5.2.1  Update  Algorithm 

The  sequences  of  truthful/bogus  observations  of  a  given  measurement  evolve,  as  the  status  of 
the  uncertain  network  may  vary.  We  must  update  the  ratings  in  order  to  reflect  the  latest  status. 

r‘  =  Xr/~l+ai 

s/  =  W_1+Pi,  (5.11) 

where  X  is  a  discounting  factor  ranging  from  0  to  1  to  reflect  the  fact  that  the  older  the  infor¬ 
mation,  the  less  it  worths. 

Thus  the  future  (projected  )  truthfulness  of  a  measurement  from  a  given  node  can  be  estimated 
as 


Tj  =  E[Beta(rj+l,st  +  l )] 
n  + 1 
ri  +  Si  +  2 

Hence  the  fused  measurement  under  assurance  is 

y,  =  ltTiR7'(iT1tTiR7l^yi.: 

(=1  i=l 


(5.12) 


(5.13) 


where,  2)  is  the  truthfulness  for  each  corresponding  node  measurement  determined  by  the  central 
trust  rating  system. 


5.3  Example 

As  an  illustration,  in  this  section,  we  demonstrate  the  idea  through  simple  examples. 

There  are  30  identical  sensors  uniformly  distributed  over  the  surveillance  region.  We  model 
the  discrete  dynamics  and  measurement  of  the  evader  as 

xt+i  =  Aext+wt 

yt,t  =  Qxt+Vij  (5.14) 

where  w  and  v  are  white  Gaussian  noises  with  zero  mean  and  covariance  Qe  =  diag  (0.152,0.152,0.152,0.152) 
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and  Rj  =  R  =  diag(0.152,0.152),  and  8  =  0.5  is  the  sampling  period. 
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X-  coordinate 


Figure  5.3:  Tracking  without  Trust  Rating 


Figure  5.4:  Tracking  with  Trust  Rating 

From  Fig. 5. 3  and  Fig. 5. 5,  we  can  see  the  accuracy  improves  for  measurements  with  trust 
rating. 

The  similar  holds  true  when  we  use  1000  nodes  and  observe  how  the  estimation  error  varies 
as  more  readings  are  compromised,  shown  in  Fig. 5. 5 
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Number  of  Compromising 
Readings 


Figure  5.5:  Estimation  Error:  ...  dot  line  indicates  with  trust  rating,  -  solid  line  without 

5.4  Related  Work 

There  are  works  making  the  effort  to  use  reputation  frameworks  in  distributed  systems  such  as 
[77].  However,  it’s  hard  to  work  around  the  problems  such  as  compromised  nodes  being  message 
passing  leader  or  bad  mouthing  from  compromised  nodes. 

While  in  our  setting,  we  think  it’s  doable  to  apply  this  method  in  module  fashion  such  that  the 
trust  computing  base  can  be  limited  to  the  central  location  only. 


5.5  Discussion 

In  a  networked  control  system  setting,  where  the  nodes  and  links  are  subject  to  attacks,  the 
usage  of  a  centralized  trust  rating  system  shows  the  potential  to  assure  the  validity  of  nodes’ 
readings.  By  using  Beta  distribution,  it  only  requires  storing  two  parameters  thus  it’s  simple  yet 
intuitive.  This  approach  provides  intermediate  assurance  to  the  data  fusion  used  by  the  Kalman 
filter  before  full-scale  implementation  of  security  solutions  to  the  networked  control  systems. 
Particularly,  this  mechanism  can  facilitate  the  disambiguation  between  honest  yet  rare  events  and 
malicious  ones.  It’s  implemented  in  our  follow-on  work. 
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Chapter  6 

Robust  General  Likelihood  Ratio  Test 


Faster  Higher  Stronger 

Olympic  Motto 

This  chapter  gives  the  gist  of  Robust  General  Likelihood  Ratio  Test  (RGLRT)  in  the  context  of 
SCADA  security  in  particular. 

The  adaptation  of  large-scale  Wireless  Sensor  Networks  (WSN)  has  enabled  Supervisory  Con¬ 
trol  And  Data  Antiquation  (SCADA)  systems  with  critical  remote  monitoring.  Meanwhile  the 
large  networks  are  prone  to  benign  components  failures  and  malicious  attacks.  To  address  such 
problems,  we  present  an  earlier  anomaly  detection  and  resilient  estimation  scheme  for  the  cyber¬ 
physical  systems,  networked  control  systems  to  be  specific,  in  an  uncertain  network  environment. 
It  robustly  identifies  and  detects  outliers  among  real-time  multidimensional  measurements  of  dy¬ 
namical  systems  by  using  an  online  window-limited  sequential  Robust  Generalized  Likelihood 
Ratio  (RGLR)  test  without  any  prior  knowledge  of  the  occurrence  time  and  distribution  of  the 
outliers.  The  robust  sequential  testing  and  quick  detection  scheme  achieves  the  optimal  stopping 
time  with  low  rates  in  both  false  alarm  and  misdetection.  We  propose  a  set  of  qualitative  and 
quantitative  metric  to  measure  its  optimality  in  the  context  of  cyber-physical  systems. 

Further,  this  resilient  and  flexible  estimation  scheme  robustly  rectifies  and  cleans  data  upon 
both  isolated  and  patchy  outliers  while  maintain  the  optimality  of  the  Kalman  Filter  under  the 
nominal  condition.  We  show  the  approximated  optimality  of  the  robustification  performance 
through  stochastic  approximation.  We  also  offer  a  simple  simulation  example  to  illustrate  our 
ideas. 

Supervisory  Control  And  Data  Antiquation  (SCADA)  systems  are  deeply  ingrained  in  the  fab¬ 
ric  of  critical  infrastructure  sectors  including  power  grids;  oil  and  gas  pipeline  systems;  water 
treatment  and  distribution;  railroads  and  mass  transit;  and  widely  involved  in  the  constitutions  of 
vital  enterprises  such  as  manufacturing  plants  and  building  climate  control  [79].  The  Wireless 
Sensor  Network  (WSN)  has  been  an  emerging  application  in  SCADA  systems.  In  the  monitoring 
and  control  of  moving  or  remote  machinery  ,  wireless  sensor  networks  have  compelling  economic 
and  engineering  advantages  over  their  wired  counterparts  [218]).  They  may  also  deliver  crucial 
information  in  real-time  from  environments  and  processes  where  data  collection  is  impossible  or 
impractical  with  wired  sensors.  Individual  sensors  simultaneously  sense  an  process  and  transmit 
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measured  information  over  a  lossy  wireless  network  to  a  control  center,  which  processes  the  data 
and  produces  an  optimal  estimate  of  the  state. 

However,  the  uncertainties  in  the  SCADA  system  itself  [296]  and  in  the  wireless  sensor  net¬ 
works  including  both  benign  component  faults  and  malicious  attacks  may  skew  the  sensor  mea¬ 
surements  and  thus  that  of  the  estimation  and  control  command  results. 

What  motivates  us  to  address  the  issue  of  outlier-detection  and  -mitigation  is  multifaceted. 
First,  outliers  are  often  a  clear  indication  of  environmental  noise  level  and  potentially  faults  in 
sensors  or  malicious  attacks  in  the  system  [306].  As  for  their  impact  on  the  applications,  in  gen¬ 
eral  the  performance  of  linear  least  squares  estimates  may  degrade  remarkably  when  plant  or 
observation  disturbances  are  non-Gaussian,  particularly  when  the  non-Gaussianness,  i.e.,  outlier, 
is  of  a  heavy-tailed  variety  giving  rise  to  occasional  very  large  values  [265,  116,  117].  In  light  of 
the  prevalent  and  broad  usage  of  the  Kalman  filter  in  engineering  fields  and  SCADA  systems  in 
particular,  we  are  mostly  interested  the  skewing  impact  of  outliers  [179]  having  on  the  Kalman 
filter  among  many  other  decision  making  algorithms  that  are  subject  to  outliers.  The  state  es¬ 
timation  error  can  grow  without  bound  since  the  estimate  is  a  linear  function  of  the  observation 
noise.  Outliers  skew  and  affect  the  performance  of  many  decision  making  algorithms,  the  standard 
Kalman  filter  ,  and  potentially  leads  to  divergency  [74]  and  instability  [238]  and  destabilize  the 
whole  controller. 

On  the  other  hand,  the  difficultly  of  online  detection  of  outliers  lies  in  that  moments-based 
procedures  themselves  are  not  robust  upon  outliers  [30,  120].  Furthermore,  the  fact  that  the  ad¬ 
versaries  have  control  over  inputs  makes  the  detection  task  more  complicated. 

The  CUSUM  (Cumulative  Summation)  method  and  its  variants  are  widely  used  for  anomaly 
detection.  As  pointed  out  in  [25], [254],  its  major  drawback  is  that  it  requires  a  priori  knowledge  on 
information  after  change,  i.e.  the  intensity  of  the  anomaly  etc.  But  in  practice,  such  information 
are  not  predicable.  Given  that  our  work  is  closely  related  to  CUSUM,  sequential  analysis  and 
hypothesis  testing  in  general,  we  deem  that  the  related  sequential  testing  approaches  deserve  a 
brief  exposition  in  more  details  in  the  following  Section  6.1. 

To  address  robustness  issues,  [310]  proposes  a  filtering  technique  that  ensures  an  estimation 
error  variance  with  a  guaranteed  upper  bound  given  the  norm-bounded  time-varying  parameter 
uncertainty  in  both  the  system  state  and  output  measurement  matrices.  Their  focus  doesn’t  include 
outlier  detection  though.  [260]  uses  a  weighted  least  squares-like  approach  by  introducing  weights 
for  each  data  sample.  A  data  sample  with  a  smaller  weight  has  a  weaker  contribution  when 
estimating  the  current  time  step’s  state.  They  treat  the  problem  as  an  expectation  maximization 
(EM)  learning  problem  with  maximization  over  all  available  data  points  at  every  time  step  while 
using  a  variational  factorial  approximation  of  the  true  posterior  distribution  to  get  analytically 
tractable  inference.  [132]  removes  the  drifting  tracking  points  using  Kalman  filter  when  the  flow 
based  tracking  approach  is  possibly  prone  to  outliers  due  to  its  aperture  problem. 

Hammes  [95]  studies  robust  positioning  algorithms  for  transmitter  devices  over  wireless  net¬ 
works  where  the  non-line-of- sight  propagation  effects  lead  to  erroneous  signal  parameter  esti¬ 
mates.  The  framework  of  an  extended  Kalman  filter  (EKF)  is  rewritten  into  a  linear  regression 
model  at  each  time  step  while  non-parametric  pdf  estimation  is  used  for  position  estimation  within 
a  parametric  signal  model  to  solve  for  position  and  velocity  of  the  user  equipment. 

Contribution  of  our  work: 
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•  we  offer  a  simplified  taxonomy /comparison  of  change  detection  methods; 

•  we  present  a  resilient  and  flexible  estimation  scheme  robustly  rectifies  and  cleans  data  upon 
both  isolated  and  patchy  outliers  while  maintain  the  optimality  of  the  Kalman  Filter  under 
the  nominal  condition; 

•  we  propose  an  online  window-limited  sequential  Robust  Generalized  Likelihood  Ratio  (RGLR) 
test  without  any  prior  knowledge  of  the  occurrence  time  or  the  distribution  of  the  outliers; 

•  the  robust  sequential  testing  bears  optimal  stopping  time,i.e.,  asymptotically  shortest  detec¬ 
tion  delay  time  while  maintaining  lowest  false  alarm  rate. 

The  rest  of  this  paper  is  organized  as  the  following,  Section  2.  gives  a  brief  exposition  of  hypoth¬ 
esis  testing  and  a  taxonomy /comparison  of  related  work;  Section  3  states  the  problem  formulation 
including  performance  metrics;  Section  4.  presents  the  resilient  estimation;  Section  5.  describes 
the  scheme  for  outlier  detection;  Section  6.  shows  simulation  results,  evaluation  and  discussion. 
Section  7.  Concludes. 


6.1  Hypothesis  Testing 

In  this  section,  we  give  an  overall  review  of  hypothesis  testing,  sequential  analysis  and  detec¬ 
tion  before  listing  a  simplified  taxonomy. 

Let  5DT  be  the  set  of  probability  measures  on  the  real  line  R  and  let  Pq,P\  be  two  distinct 
elements  of  5DT,  having  densities  po,  p\  with  respect  to  some  measure  oo.  Denote  {zk}™  sequence 
of  identically  independently  distributed  (iid)  observations  of  a  random  variable  Z  with  distribution 
D.  The  testing  problem  is  hypotheses 


{H0  :  D  —  Pq 
[Hx  :  D  —  P\ 


(6.1) 


Let  poP  dependent  on  a  parameter  0,  be  the  respective  densities  of  P,  for  i  =  0, 1  with  respect 
to  some  dominating  measure  oo. 

To  discriminate  between  two  we  may  either  use  the  likelihood  ratio  test  provided  by  the 
Neyman-Pearson  lemmma,  or  Wald’s  sequential  probability  ratio  test. 

Recall  that  log-likelihood  ratio  is  defined  as 


,  P6j  (z) 

=  Io ^ 

=  n,ts(0M 


X^Ll  J  P®  1  (^L 


(6.2) 


6.1.1  Fixed  Sample  Size  Test 

For  the  Neyman-Pearson  test,  the  sample  size  is  fixed  and  we  reject  hypothesis  Ho  if  S„  is  too 
large. 
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6.1.2  Sequential  Probability  Ratio  Testing 

Wald’s  Sequential  Hypothesis  Testing  (SHT),  or  the  Sequential  Probability  Ratio  Testing  (SPRT) 
scheme  [270]  in  1947  not  only  enjoys  the  benefits  of  relatively  small  sampling  size  as  that  of  single 
sampling  schemes  in  the  detection  of  large  changes,  but  also  retains  a  desirable  expected  sampling 
size  before  action  is  taken  when  dealing  with  small  changes  in  magnitude  [205]. 

The  task  of  SHT  becomes 


[  So  = 

“I  sk+ 1  = 

[N  = 

The  SHT  decision  rule  d n  follows, 

dN  ~  \  H0  if  Sn<L 

where  L  m  and  U  ~  In  1  with  Fa  being  the  predefined  false  alarm  rate  and  Fn  the 

predefined  false  negative  rate  or  the  missed  detection  rate  upon  user’s  choice  and  tuning. 

Under  the  assumptions  that  hypothesis  Hq  is  of  the  distribution  Pq  with  a  probability  function 
po  and  77 1  of  P\  and  p  \ .  Pick  2  numbers  a,  b  with  a  <  0  <  b  and  define  the  decisive  sample  number 
(the  stopping  rule  or  the  detection  rule) 

N  —  infjn  >  1  :  Sn  <  a  or  Sn  >  b}  (6.5) 


0 

io*Sli+x«'  (63) 

infjn  >  1  :Sn  <£  [L,U]}, 


H  i  if  SN>U 


with  infO  ^ 

Wald  [270]  proved  that  N  is  almost  surely  finite  under  both  Po  and  P\.  The  testing  procedure 
is  to  stop  at  stage  N  and  reject  Tq  if  Sn  >  b  and  accept  Ho  if  Sn  <  a  (hence  reject  H\ ).  We  denote 
this  test  SPRT(a,b,Po,Pi).  The  average  sample  numbers  are  Ey[A^],y  =  0, 1,  where  E;-  denotes 
expectation  under  Pj.  The  error  probabilities  arc  a  =  Po(Sn  >  b)  and  p  =  P\  (Sn  <  a). The  SPRT 
is  optimum  in  the  following  sense.  Consider  any  other  testing  procedure  with  corresponding 
elements  a'^Eo^i  then  (cf.  Lehmann  1959  [159]),  it  holds  that 


f  a'  <  a  f  E0[7V]  <  E q[N]' 

\  ft  <  p  ^  \  Ei  [TV]  <  Ei  [N]' 


(6.6) 


SPRT’s  major  strength  lies  in  two-fold  that  it’s  a  recursive  online  scheme  and  optimal  in  sample 
size  for  both  hypothesis  with  theoretical  proof  on  bounds.  However,  it  assumes  0i,  the  distribution 
after  change  is  known,  while  in  reality,  especially  for  the  goal  of  this  paper,  it  is  not. 


Sequential  Detection 

Closely  related  to  sequential  testing  theory  is  the  theory  of  sequential  change-point  detec¬ 
tion.  Page  [205]  and  Shiryaev  [248]  modified  Wald’s  SPRT  and  developed  the  cumulative  sum 
(CUSUM)  [205]  and  the  Shiryaev-Roberts  charts  [248]  respectively  to  improve  the  sensitivity  of 
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the  Shewhart  charts  [247].  The  goal  of  optimality  in  the  Shiryaev-Roberts-Pollak  (SRP)  sense  is 
to  minimize  the  worst-case  average  delay  subject  to  the  upper  bound  of  a  false  alarm  whereas  in 
Lorden  ’s  sense  is  to  minimize  the  upper  bound  of  the  worst  case  delay  subject  the  upper  bound  of 
a  false  alarm  [166]. 

The  CUSUM  [26,  33,  88,  188]  test  is  one  of  the  most  successful  algorithms  of  sequential 
change  detection.  The  CUSUM  procedure  developed  in  1954  calculates  the  cumulative  sum  of 
samples  from  a  process  X„  with  weights  co„  in  the  following  fashion, 

f  S0  =0 

[  5'// .  i  max(0,Sn  -t-  Xn  ) 

The  stopping  rule  or  the  detection  rule  is  that:  when  the  value  of  S  exceeds  a  certain  threshold 
value,  a  change  in  value  has  been  found  1 . 

Widespread  applications  and  theory  development  in  quality  control  [168,  188,  235],  fault 
detection  [51,  276],  surveillance  [121,  133],  anomaly  detection  [252,  172]  are  stemmed  from 
CUSUM  and/or  CUSUM  alike  procedures. 

Some  of  the  methods  proposed  over  the  years  were  originally  ad  hoc  procedures  and  were  later 
proven  to  possess  optimality  properties  including  both  Wald’s  SPRT  or  Page’s  CUSUM.  Others 
remain  popular  though  sub-optimal  such  as  Shewhart  [247]  and  Exponentially-Weighted  Moving 
Average  (EWMA)  [228]  control  charts. 

The  overall  comparison  and  a  simplified  taxonomy  is  summarized  in  Table.  6.1. 

For  a  more  detailed  review  on  sequential  analysis  or  sequential  change-point  detection  in¬ 
volving  multivariate  and  dependent  observations,  interested  readers  please  refer  to  [154]  and  [25] 
respectively. 


6.2  Problem  Formulation 

First  we  recap  estimation  and  identification  in  state-space  models  and  the  statistical  approach 
based  on  the  Kalman  filter  and  likelihood  techniques. 

6.2.1  A  General  State  Space  Model  Setting 

Fet  positive  integer  k  =  0,1,...  denotes  discrete  time,  then  stochastic  state-space  model  in 
discrete  time  has  the  following  form 

state:  xk+i  =  Fkxk  +  Gkuk  +  wk  (6.7) 

observation:  yk  —  Hkxk  +  Jkuk  +  vk  (6.8) 

where  xk  e  R11  is  the  (hidden)  internal  state  vector, 
uk  G  Rr  is  the  input  vector, 

1  Note  the  above  formula  (6.7)  only  detects  changes  in  the  positive  direction.  When  negative  changes  need  to  be 
found  as  well,  the  min  operation  should  be  used  instead  of  the  max  operation,  and  this  time  a  change  has  been  found 

when  the  value  of  S  is  below  the  (negative)  value  of  the  threshold  value. 
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yk  e  Rm  is  the  output  i.e.  observation  (measurement)  vector. 

wk  e  Rr,  the  process  (plant)  (6.7)  noise  vector,  is  a  white  Gaussian  noise  sequence  with  zero  mean 
and  covariance  matrix  Qk  >  0. 

Vk  E  Rm,  the  observation  (measurement)  (6.8)  noise  vector,  is  a  white  Gaussian  noise  sequence 
with  zero  mean  and  covariance  matrix  Rk  >  0. 

{Fk}  the  state  transition  matrix,  {64}  the  observation  matrix,  { Gk }  and  {4}  the  control  matrices 
are  known  sequences  of  matrices  with  appropriate  dimensions. 

The  initial  system  state  vector  x$  is  Gaussian  with  zero  mean  and  covariance  matrix  Pq.  We  assume 
that  the  initial  state  xo  and  the  two  noise  sequences  try,  ty  are  mutually  independent.  We  will  use 
observation  and  measurement  interchangeably. 

In  summary,  (6.7)  is  a  recursive  state  model  of  the  linear  dynamical  process  (plant),  and  (6.8),  a 
linear  observation  model  of  the  system.  Note  such  a  model  (6.7)-(6.8)  is  a  Markov  model,  namely 
the  pair  (Xk+i ;Yk)  is  a  Markov  process. 

6.2.2  Kalman  Filter 

The  Kalman  filter  provides  one  particular  estimate  of  the  state  Xk  of  the  system  (6.7)-(6.8). 
It’s  a  minimum  variance  estimate  of  the  state,  namely  the  conditional  mean  2  of  Xk  given  the  past 
observations  {'■'■'■',yk-2',yk-i}-  We  denote  this  one-step  ahead  prediction  as  Xk+i\k- 

As  shown  in  Fig. 6.1,  the  overall  flow  diagram  of  the  Kalman  Filter,  it’s  an  on-line  recursive 
algorithm.  To  illustrate  its  recursion,  we  decompose  its  procedure  into  two  phases,  namely  the 
predication  phase  and  measurement  update  phase. 

Fig. 6. 2  illustrates  the  recursive  procedure  of  the  Kalman  filter,  noting  at  each  time  step,  only 
current  and  previous  step  are  involved.  That  is  to  say  no  batch  operation  is  required.  This  is 
precisely  what  makes  the  Kalman  filter  an  online  algorithm. 

6.2.3  Outliers’  Distribution  Model 

We  shall  point  out  that  employing  a  outliers’  distribution  model  only  gives  us  a  somewhat 
plausible  and  trackable  model  for  generating  outliers  [174]  and  for  illustrating  the  impact  of  out¬ 
liers  on  estimation  performance.  That  is  not  to  say  that  our  detection  scheme  is  dependent  on  the 
outliers’  distributions,  otherwise  it  is  not  robust  nor  effective. 

There  are  several  types  heavy-tailed  or  alternatively  referred  to  as  fat-tailed  distributions  3  in 
wide  use[175].  Alternatively,  the  contaminated  normal  distributions  is  one  specific  instance  of 
the  more  generic  mixture  distribution  model  for  outliers  [93]  which  will  suffice  for  purposes  of 
our  current  exposition.  To  be  more  specific,  the  outliers  are  generated  through  the  contaminated- 

2When  the  Gaussian  assumption  concerning  the  noises  is  removed,  the  Kalman  filter  gives  the  linear  minimum 
variance  estimate  of  the  state,  namely  the  smallest  unconditional  error  covariance  among  all  linear  estimates,  but,  in 
general,  this  estimate  is  not  the  conditional  mean  (Goodwin  and  Sin,  1984). 

3 A  fat  tail  is  a  property  of  some  probability  distributions  exhibiting  extremely  large  kurtosis  particularly  relative 
to  the  ubiquitous  Gaussian  which  itself  is  an  example  of  an  exceptionally  thin  tail  distribution.  Fat  tail  distributions 
have  power  law  decay. 
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Figure  6.1:  The  Kalman  Filter  Flow  Chart 
innovation:  ek+i  =  yk+ 1  —  Sk+V-  to  “correct” 
update:  xk+\\k+i  —  *k+i\k  +  Kk+iek+F 
1-step  predication:  xk+i\k  =  Fkxk\k  +  Gkuk 

Xk:Uk,yk,w k,vk'  the  state,  input,  observation,  process  noise,  observation  noise  vector;  Fk,Hk,Gk  and  Jk:  the 

state  “transition”,  observation,  control  matrices. 

normal  distribution  with  degenerate  central  component  [174] 

CN(t-j,a2)  =  (l-y)N(f,0,0)+yN(f,0,c2)  (6.9) 

That  is  to  say  the  process  xt  is  observed  perfectly  about  100(1  —y)  percent  of  the  time  and  is 
corrupted  by  outliers  about  lOOy  percent  of  the  time,  where  0.01  <  y  <  0.25. 

6.2.4  Further  Property  Assumptions 

Furthermore,  for  some  integer  d,  let  (R^.B.  A.)  be  a  measure  space,  where  R  is  the  real  line, 
B  the  Borel  o-algebra,  and  X  the  Lebesgue  measure.  Let  F  be  a  zero-mean  probability  measure 
on  (Rf/,B)  such  that  F  is  absolutely  continuous  with  respect  to  X  and  admits  the  density  /  in 
accordance  with  Radon-Nikodym  theorem. 

We  have  a  sequence  of  identically  independently  distributed  (iid)  observations  {zk}'o  of  a 
random  variable  Z  with  a  probability  density  pq(Z)  that  is  dependent  on  one  scalar  parameter 
only.  The  parameter  0  =  0o  before  a  unknown  change  time  v  and  0  =  0j  after  v. 

Note  that  change  time  v  is  unknown.  We  either  consider  v  as  a  nonrandom  unknown  value  or  a 
random  unknown  value  with  unknown  distribution.  In  other  words,  we  deal  with  a  nonparametric 
approach  as  far  as  this  change  time  v  is  concerned.  In  practice,  either  it  is  very  difficult  to  have  a 
priori  information  about  the  distribution  of  the  change  times,  or  this  distribution  is  nonstationary 
(i.e.  it  doesn’t  have  an  invariant  mean  nor  variance).  This  is  particularly  meaningful  for  our 
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Measurement  Update  ( “Self-Correct’") 

(1)  Compute  the  Kalman  Gain 

ft+i  =  /W^iM+rf+i+fo)'1 

coy(ek+i) 

(2)  Update  estimate  with  observation  yk+1 

6k+1~  yk+1  -  yk+1|k 

(3)  Update  the  error  covariance 

A+iH+i  =  (/-^+i^+i)A+i|* 


Time  Update  (1-step  ahead  "Predict”) 


(1)  Project  the  state  ahead 

4H|  k  =  tyty+fyUk 

(2)  Project  the  error  covariance  ahead 

Pm  =  w/+ a 


Figure  6.2:  The  recursive  operation  of  the  Kalman  Filter:  a  combination  of  the  high-level  diagram 
in  Fig. 6.1  and  the  formulations  in  section  6.2.2 


problem  setting,  giving  that  we  have  no  a  priori  knowledge  of  when  the  intrusion  thus  outliers 
or  anomalies  would  occur  at  all.  That’s  the  reason  why  certain  basic  tools  can’t  directly  suit  our 
problem. 

Our  security  model  is  that  the  SCADA  center  itself  is  secure  and  so  are  the  core  programs. 
We  assume  the  attack  is  session  based,  should  it  arise  over  the  network. 

By  “resilient”,  we  stress  the  importance  of  the  flexility  and  parsimoniousness  of  the  overall 
strategy.  Without  incurring  too  large  overhead,  it  shall  maintain  the  systems’s  optimal  performance 
under  nominal  conditions  while  strive  for  near  optimal  performance  should  atypical  situations 
arise  without  being  unduly  affected  by  spurious  observations. 

6.2.5  Meaningful  Metrics  for  Recursive  Robust  Estimation 

It’s  only  appropriate  to  bring  up  the  issue  of  the  robustness  of  estimation  schemes  when  we  ad¬ 
dress  outliers.  Conceptually,  the  definition  of  robustness4  we  use  here  stipulates  that  small  changes 
from  an  assumed  nominal  model  would  only  introduce  small  changes  in  estimate,  according  to 
both  Tukey  [265]  and  Huber  [213].  Furthermore,  robust-resistant ,  a  purely  data-oriented  notion 
defined  by  Tukey  [266],  refers  that  an  estimate  is  called  resistant  if  changing  a  small  fraction  of  the 

4The  word  “robust”  is  loaded  with  many  if  not  often  inconsistent  meanings. 
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data  by  large  amounts  results  in  little  change  to  the  estimate.  That  is  to  say  the  capability  against 
gross  error  and  outliers. 

Formulation  wise,  while  the  minimax  approach  is  pessimistic,  it  provides  an  optimum  lower 
bound  on  performance.  Let:  T  be  a  class  of  estimates,  J  a  class  of  distributions,  and  V ( T,F )  the 
asymptotic  variance  of  Tel  when  the  distribution  is  F  e  J.  Then  the  minimax  robust  estimate 
Tq  and  its  associated  least  favorable  distribution  Fq  satisfy 

minmaxV(r,F)  =  V(7b,Fo)  =  maxmin  V(T,F)  (6.10) 

TeZ  Fed  Fe$  TeZ 

Naturally,  this  can  be  viewed  as  a  game  in  which  we  choose  Tel,  nature  chooses  TeJ  and 
V(T,F)  is  the  payoff.  This  game  has  a  saddle  point  pair  (Tq.  Fq)  if  Tq  and  Fq  satisfy  the  above 
(6.10). 

Furthermore,  for  multivariate,  dependent  Markovian  (state  space  model)  without  process  noises, 
analytically  the  asymptotic  variance  is  still  a  good  choice  of 

Plus,  in  this  paper,  this  goal  is  to  achieve  optimally  estimating  and  tracking  the  state  of  stochas¬ 
tic  time-variant  linear  dynamic  system  rather  than  obtaining  minimum  asymptotic  estimation  er¬ 
ror.  Thus  approximations  of  a  conditional  mean  estimator  which  is  known  for  its  unbiasedness 
and  minimum  error  variance  [11],  are  targeted  [241]. 

6.2.6  Sequential  Detection  Performance  Measure 

False  Alarm  Constraints 

Often  the  methodology  of  optimal  change-point  detection  pursues  stopping  rules  that  achieve 
the  best  balance  of  the  mean  detection  delay  and  the  rate  of  false  alarms  or  minimize  the  mean 
delay  under  a  fixed  false  alarm  probability  [22].  In  order  to  establish  a  sound  sequential  detection 
performance  measure,  we  must  first  lay  out  the  associated  false  alarm  probability  constraints  that 
the  asymptotic  lower  bound  for  the  detection  delay  is  subject  to. 

E(v)(r  -v)1{r>v}  =  eM(T-v)+  (6.11) 

Accordingly,  three  related  false  alarm  probability  constraints  in  the  ascendant  order  of  stringency 
are  listed  as  follows: 

•  For  iid  observations,  due  to  Shiryaev  [248],  the  Bayesian  view  concerns  the  mean  delay  to 
detection  under  the  average  false  alarm 

oo 

P(T<v)=  J^tia(k)P0(T  <k)  <a  (6.12) 

k=  t 

where  na  is  a  prior  distribution  of  the  change  time  v. 

•  Whereas  the  ARL  ( Average  Run  Length)  [205]  to  false  alarm  constraint  in  a  minimax  for¬ 
mulation 

Eo[T]  >  y>  1 


(6.13) 
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is  the  worst  case  in  Lorden’s  sense  [166],  and  is  no  smaller  than  a  given  number  y  >  1 
when  the  quality  parameter  remains  fixed  0.  The  objective  is  to  find  the  stopping  rule  that 
minimizes  the  worst-case  delay  subject  to  an  upper  bound  on  the  false  alarm  rate. 

•  For  non-independent  observations,  Lai  proposed  a  change-of-mecisure  argument  [155],  the 
most  stringent  one  among  the  three,  to  guarantee  a  lower  bound  on  the  window -limited 
stopping  time,  or  the  detection  delay: 

supPoiy  <T  <v  +  ma)  <  a,  where 
v>l 

liminf1 — — T  >  but 

|  log  oc| 

logma  =  o(loga)  as  a  — »  0.  (6.14) 

The  reason  we  choose  the  most  stringent  false  alarm  constraint,  namely  Lai’s  change-of- 
measure  argument  (6.14)  lies  in  that  it  meets  our  desire  to  have  as  low  as  possible  false  alarm 
while  achieving  an  asymptotic  lower  bound  for  the  detection  delay. 

Correspondingly,  as  a  — >  0  for  a  positive  integer  I,  the  asymptotic  lower  bound  for  the  detec¬ 
tion  delay  is 

E(v)(T-v)+  >  {P0{T  >  v)/I  +  o(l)}\loga\ 

uniformly  in  v  >  1  .  (6.15) 


6.3  Resilient  Estimation 


Contaminated  Observations  with  additive  outliers  Suppose  at  an  unknown  time  v,  the  sen¬ 
sor  measurement  (observation)}^  (6.8)  is  subject  to  some  additive  outliers  or  anomaly,  formally 


yk+yaokHk>v} 

(6.16) 

Hkxk  +  JkUk  +  Vk 

(6.17) 

Hkxk  +  JkUk  +  Vk+  yaokHk  >  v} 

(6.18) 

where  %  is  the  observed  data  and  the  yaok  are  the  additive  outliers  1  {k  >  v},  either  in  isolation  or 
in  cluster,  1  {k  >  v}  is  a  compact  notion  of  an  indictor  function  indicating  the  occurrence  of  the 
outliers  (anomaly), 


1  k  >  v 
0  k  <  v 


(6.19) 


Theorem  1.  A  robust  state  estimate  suffices  above  conditions  is  optimal  in  the  min-max  sense,  i.e. 
having  minimum  variance  over  the  least  favorable  contaminating  distributions.  It  can  take  the 
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following  form  with  xk\k  =  E[.%|yk]  ,  compared  to  the  original  Kalman  filter. 

Xk+\\k  Fk+lXk\k  (6.20) 

Pk+l\k  =  Fk+\h\kFk+\  +Qk+\  (6.21) 

Kk+ 1  -  Fk+i\kHk+i^k+i  (6.22) 

-4+1I&+1  =  xk+l\k  +  Kk+i{yk+l -Hk+ixk+i\k~Jk+iuk+i)  (6.23) 

Fk+l\k+l  =  {I  —  Kk+lHk+l)Fk+l\k  (6.24) 

with  the  robustified  ( censored )  covariance  matrix  of  the  innovation  ( residual )  becoming, 

tk  =  HkPk\k-iHl  +R\wkR\  (6.25) 

where 

Wk  =  diag{w\kl  •  •  •  ,  wmk}  (6.26) 

and  w\k,  ■  ■  ■  ,wmk  would  be  defined  later  in  the  proof 


Proof:  :  We  first  show  the  result  through  construction.  It  is  straightforward  that  the  state 
estimator  xk\k  corresponding  to  xk\k  =  E[;tfc|yk,Uk]  of  the  original  Kalman  filter  can  be  obtained  by 
minimizing 

Xk+l\k+l  ~ 

argmin  {  (xk+i\k-xk+i)T  (Pk+i\kyl(xk+i\k~xk+l) 

+  {%+l  -Hk+lXk+l  -  Jk+lUk+l)T (Rk)~l 
x  (jk+i  ~ Hk+lXk+i  ~  Jk+iUk+i)}  (6.27) 

with  respect  to  xk+\  E  Rn,  or  equivalently 

n  m 

xk\k  =  argmin { £  (pik  -  aikxk)2  4-  ^  (sjk  -  b  jkxk  -  q jk)2}  (6.28) 

i=i  j=  l 

where  pk  =  {Pk\k-\)~^xk\k-usk  =  (■ Rk)~bk,<lk  =  (Rk)^^JkUk.ak  =  (Pk\k-i)~fbk  =  (Rk)~^Hk, 
so  that  pik,  Sik  and  qjk  are  the  i— th  component  of  the  vectors  pk  E  Rn  xl,  sk  E  Rnxl  and  qk  E  Rnxl 
correspondingly;  ajk  E  Rlxn  and  bjk  E  Rlxn  are  the  i—  row  vector  of  the  matrix  ak  E  Rnxn  and 
bk  E  Rnxn  correspondingly.  In  the  case  of  M— estimation,  the  least  squares  solution  is  replaced  by 

n  m 

Xk\ k=argmin{YJ(Pik-aikxk)2 +Y,  P;'('s>  ~ bjkxk -Qjk)2} 

i=  1  7=1 


(6.29) 
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where  the  p;  are  suitable  score  functions  with  derivatives,  i.e.  influence  function  \\ij,  or  psi- 
function  used  in  robust  statistics.  One  of  Huber’s  psi-function  is 

^<Z>  =  {  ,  sg„(Z)  for  !z!  >  <6-3°> 

is  often  used5 6.  It  gives  robust  estimates  of  location  which  are  optimal  in  the  min-max  sense,  having 
minimum  variance  over  the  least  favorable  contaminating  distributions. 

The  normal  equations  for  xk\k  corresponding  to  (6.29)  have  the  form 

n  m 

aJk(Pik  - <HkXk\k)  +  £  b]k^j(sjk  -bjkxk\k  -qjk)  =  0  (6.31) 

(=1  ;'= t 

and  can  be  solved  explicitly  only  in  some  special  cases.  This  is  quite  pragmatic  as  well,  sensors 
are  normally  set  with  bound  values  in  practice. 

Alternatively,  one  can  use  the  approximated  normal  equations  if  we  approximate6  xk\k  by  xk\k~i 
when  using  the  weight  function  wjk  as  the  following, 

n  m 

L aIk(Pik  - aik*k\k)  +  L  w ikh'jd-s ik  ~bjkh\k  -qjk)  =  o  (6.32) 

i=  1  7=1 


where  the  weight  functions  wjk,  j  =  1, . . . ,  m  are 


tyj(sjk  bjk-%k\k  qjk ) 
sjk  —  bjk%k\k  —  q  jk 


(6.33) 


Using  (6.32)  and  some  algebra,  we  obtain  robustified  (censored)  covariance  matrix  of  the 
innovation  (residual), 

lk  =  HkPk\k-\Hk  +4Wk4  (6-34) 

where  Wk  =  diag{wlk,  ■  ■  ■  ,  wmk} 


6.4  Robust  Outlier  Detection 

The  overall  procedure  is  shown  as  in  Figure  6.3. 

6.4.1  System  model  with  outliers  contaminated  observations 

Following  the  definition  of  the  contaminated  measurement  yk  (6.16-6.19),  the  state  xk,  the 
estimate  xk\k,  and  the  output  residual  ek  of  the  Kalman  filter  upon  the  outliers  occurred  at  time  v 

5The  recommended  choice  of  ,v  in  (6.30)  is  .y  =  u where  ua  is  the  a-quantile  of  /V(0. 1)  (e.g.,  s  =  1.883  for  a 
3%  contamination  of  data. 

6They  can  be  considered  as  a  recursive  variant  of  the  normal  equations  from  the  Iterative  Weighted  Least  Squares 
IWLS  method  which  is  a  popular  algorithm  for  numerical  calculation  of  M-estimates. 
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Figure  6.3:  Block  Diagram  of  Robust  Outlier  Detection  and  Resilient  Estimation 
can  be  expressed  in  the  relations  of  their  nominal  counterparts,  as 

h\k  =  h\k  +  P(M)>-ao 

h  =  ek  +  p{k,\)yao  ’ 

where  the  terms  (3(k,v),  p(k,  v)  would  be  defined  later. 

Conditioned  on  the  past  outputs  and  input  signals  Uk,  the  innovation  ek  has  the  conditional 
mean  E[eJ.  Let’s  denote  /uk  —  E[cy,],  then 


»  =  !«  =  {  j|(t’VW  (6.36) 

where  v,yflo  are  unknown.  The  p (k,t)  are  matrices  that  can  be  recursively  evaluated  after  initial¬ 
ization  p  (t,t)  —  0,P(t  —  l,t)  =  0, 

P(M)  =  Fk^(t-l,k)+Kkp(kd)  (6.37) 

p(*+M)  =  -Ht+1F${k,t)+I  (6.38) 


where  $(k,t)  and  p  (k,t)  are  the  difference  of  the  estimate  xk\k,  residual  ek  under  outliers,  compar¬ 
ing  with  their  nominal  counterparts  as  stated  in  (6.35),  to  be  evaluated  recursively  in  parallel  for 
k  >t  and  for  every  fixed  t,  one  for  each  t  within  a  moving  window  t  G  {n  —  m,  ■■  ■  .n  —  m' } . 
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Meanwhile,  the  covariance  matrix  of  the  innovation  is 

Vk  = 


It’s  easy  to  verify  the  design  purpose,  for  k  <  v  weight  functions  Wjk  =  1  ,  V j  G  [1,  m\  thus  JL/(  —  E^ 

6.4.2  Robust  Sequential  Probability  Ratio  Tests 

According  to  Huber  [119],  a  statistical  procedure  is  called  robust  if  its  performance  is  in¬ 
sensitive  to  small  deviations  of  the  idealized  theoretical  model.  In  terms  of  the  robustness  of 
a  test,  it  shall  withstand  small  arbitrary  departures  from  both  the  null  hypothesis  ( robustness  of 
validity )  and  the  specified  alternatives  ( robustness  of  efficiency  )  [120].  When  encountering  de¬ 
viation,  the  classical  probability  ratio  test  is  not  robust  in  the  following  sense:  a  single  outlying 
data  point  thus  deviating  factor  p\(xj) / po{xj)  equal  (or  almost  equal)  to  0  or  “  may  unduely 
impact  the  test  statistic  T (x)  —  fl'i P\{xj) /po(xj)  therefore  may  totally  skew  the  final  hypothe¬ 
sis  or  probability  test  outcome.  By  censoring  the  single  factors  at  some  fixed  numbers  c'  <  c" 
for  sequential  probability  ratio  test,  one  can  replace  the  test  statistic  by  T'{ x)  —  Y[1^(xj),  where 

K(xj)  =  max{c' ,min{c" , 

Note  that  we  have  precisely  done  so  in  the  stage  of  resilient  estimation  that  one  of  the  key 
components  of  our  test  statistics,  the  covariance  matrix  of  the  innovation  (residual),  E^  (6.34)  or 
V  (6.39),  has  been  “censored”. 

Detection  Rules 

Without  assuming  any  prior  knowledge  of  parameter  T|,  the  RGLR  rule  maximizes  the  log 
likelihood  ratio  over  a  window  of  inputs  and  decide  the  time  to  raise  an  alarm  according  to  certain 
rule,  which  we  will  state  without  formally  proving  as  certain  steps  have  showed  by  Huber  [119] 
and  Quang  [223]  in  a  sequential  testing  setting  . 


E[(e*  -  E[e*])(ejt  -  E[e*])J 
tk  k  >  v 
Efc  k  <  v 


(6.39) 

(6.40) 

(6.41) 
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Theorem  2.  The  following  stopping  rule  is  optimal  and  robust 

n 

Ng  —  infjn  :  max  sup  V  log[/(E-  12 

n—M<t<n—M'  r|  pf/. 

x  (et  -  p(i,t)r\))/f(£f1/2ej)\  >  cxj 

n 

—  infjn :  max  (V  p T (i,t)£flei)T 

n—m<t<n—m! 

-  i=k 

•(Ep  I'(«,()£r1p(''.o)“1 

i=k 

•(EP  T(i,l)tT,e,)/2>cx}  (6.42) 

i=k 

where  f(y)  =  e  2  / (27t)^“  denotes  the  dimensional  normal  density,  C,  —  dim(y\),  and  m'  +  1  > 
C,  so  that  the  matrix  inversions  in  (7.28)  are  valid. 

In  essence,  we  are  looking  at  an  optimal  stopping  time  problemmot  to  stop  too  early  to 
produce  a  false  alarm  nor  to  stop  too  late  to  miss  a  real  anomalous  event. 

Huber  [119]  showed  that  in  the  neighborhoods  of  the  idealized  underlying  distributions,  which 
is  the  least  favorable  situation  for  both  Type  I  (false  alarm)  and  Type  II  (miss  detection)  error 
probabilities,  the  so  called  censored  probability  ratio  test  is  most  robust  in  a  well  defined  minimax 
sense. 

In  light  that  our  test  statistic  has  undergone  the  censoring  processing  at  the  robustified  esti¬ 
mation  stage,  so  our  concerns  translate  into  whether  the  corresponding  sequential  testing  still  are 
least  favorable  for  errors. 

Quang  [223]  further  proved  that  with  the  limiting  maximum  error  probabilities  less  1/2,  such 
sequential  test  is  also  least  favorable  for  ASN  Average  Sample  Number  and  asymptotically  mini¬ 
max  with  respect  to  expected  sample  sizes. 

6.4.3  Threshold  and  Window  size  Choice 

Note  that  (7.27)  computes  p (t,k)  recursively  over  the  each  window.  How  to  optimally  choose 
M,M  and  c-k  in  general  is  a  difficult  problem  [25]  for  online  practices  particularly  due  to  the 
coupling  effect  between  the  threshold  and  window  size  on  the  asymptotical  performance  of  the 
detection  rule.  But  for  off-line  operations,  the  choice  of  window  size  is  less  demanding  as  all  the 
data  set  is  available,  it’s  only  a  matter  of  computation  time. 

The  threshold  c  in  the  rule  Nw  subject  to  the  false  alarm  probability  criterion  Pq(Nw  <  m ) 
can  be  computed  by  using  Monte  Carlo  computation  of  Pq(Nw )  together  with  the  method  of  suc¬ 
cessive  linear  approximation  combined  with  bisection  search  for  iterative  solution  of  the  equation 
Pq(Nw  <  m). 

With  the  window  size  M,  we  have  M  ~  alogy  where  Eq(T)  ~  y,  and  a  >  /(fj  Q) .  The  importance 
sampling  procedure  procedure  for  Monte  Carlo  computation  of  Pq(Nw  <  m )  involves  the  following 
steps  as  shown  in  Algorithm.  1, 
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Algorithm  1  Importance  Sampling  for  Pq 
while  N  >  0  do  {run  N  times} 

generate  v  e  {1  ,m}  and  0  e  N(0,p) 
for  t  <  min(Nw,m )  do 
if  t  <  v  then 
covt(et)  <-  Vt 
Et{et )  0 

else 

covt{et )  <-  Vt 
Et(et)  =  p(t,v)Q 

end  if 

for  1  <  k  <  i  <  t  <  m  do 
Q,k  I +  E?i=kpT  (i,k)Vj  !p  (i,k) 
dt,k  Eti=kpT(i,k)Vj~lei 

T  2*=  i  {detC,p)—  1  /  2exp{dJkC~^d,^k/2)+l-t 

Lt  <  w  1 

end  for 
end  for 
N^N-  1 
end  while 

Po(r  <  m) 


Note  that  Eq(T)  ~  pQ{T<m) 1  ~  ^lus  threshold  c  in  the  rule  Mv  subject  to  the  false  alarm 
probability  criterion  Pq(Nw  <  m/y)  can  be  computed  by  using  the  above  procedure  for  Monte 
Carlo  computation  of  Pq(Nw  together  with  the  method  of  successive  linear  approximation  com¬ 
bined  with  bisection  search  for  iterative  solution  of  the  equation  Pq(Nw  <  m/y). 


6.5  Experiments  and  Evaluation 

Currently,  we  are  using  synthetic  data  to  conduct  experiments.  We  model  the  discrete  dynam¬ 
ics  and  two-dimensional  measurement  of  the  tracked  object  as 

xt+i  =  Aext  +  wt 

yijt  =  Qxt+Vij  (6.43) 

where  w  and  v  are  white  Gaussian  noises  with  zero  mean  and  covariance  Qe  =  diag  (0.152,0.152,0.152,0.152) 
and  Rj  —  R~  diag(0. 152, 0. 152),  and  8  =  0.5  is  the  sampling  period. 
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The  reason  of  employing  such  examples  lies  in  that 

•  its  multidimensionality  suffices  the  complexity  purpose; 

•  it’s  generic  enough  to  illustrate  the  impact  of  outliers. 

6.5.1  Resilient  Estimation  Performance 

As  stated  in  Section  6.2.5,  we  evaluate  the  estimation  performance  in  terms  of  the  error  vari¬ 
ance.  Figure  6.4  shows  that  our  resilient  estimation  scheme  performance  better  than  the  standard 
Kalman  filter  upon  randomly  injected  outliers  while  maintaining  the  latter’s  under  nominal  condi¬ 
tions. 


Tracking  Error  in  the  Standard  Kalman  Filter 


Estimation  error  in  Y-direction  Estimation  error  in  X-direction 

Tracking  Error  of  the  Resilent  Estimation 


Estimation  error  in  Y-direction  Estimation  error  in  X-direction 


Figure  6.4:  Tracking  Error  Comparison:  The  lower  panel  shows  the  performance  of  our  Resilient 
Estimation  is  identical  to  that  of  the  standard  Kalman  filter  under  nominal  condition  while  having 
much  smaller  errors  upon  outliers  at  time  T  =  10,30,60. 


6.5.2  Robust  Outlier  Detection  Performance 

With  randomly  injected  outliers  where  the  false  alarm  constraint  is  achieved  through  Monte 
Carlo  simulation,  our  approach  successfully  detects  multiple  them  as  shown  Figure  6.5. 

6.5.3  Limitation  and  Discussion 

As  Pearson  discussed  in  [209],  the  MT- filter  used  in  this  work  can  be  inapplicable  when  the 
covariance  matrix  on  which  the  Kalman  filter  is  based  becomes  singular.  One  way  to  deal  with 
singular  covariance  matrices  for  the  Kalman  filter  is  to  use  Singular  Value  Decomposition  [61, 
283]. 
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6.6  Discussion 

The  deployment  of  large-scale  WSN  profoundly  changes  the  operation  of  SCADA  systems. 
While  such  advancement  facilities  convenience  of  efficiency,  it  also  exposes  SCADA  systems  and 
WSN  to  more  potential  of  uncertainty  if  the  reliability  and  security  aspect  is  not  well  addressed. 
We  start  the  first  steps,  namely  the  resilient  estimation,  towards  the  concept  and  realization  of  the 
resilient  control,  which  stipulates  to  maintain  the  optimality  of  standard  operations  under  nominal 
conditions  and  to  adapt  abnormal  situations  through  alleviating  their  impact.  We  also  present 
an  online  robust  outlier  detection  scheme  that  is  optimal  according  to  a  stringent  performance 
measure.  Furthermore,  this  is  accomplished  without  incurring  large  overhead.  Future  work  lies  in 
the  direction  of  implement  these  methodologies  on  real  data. 


Outliers  are  detected  at  T=  11 , 32, 57  80. 


X 


(a)Detection  of  3  outliers 


Outliers  are  detected  at  T=  10,30,03 


X 


(b)Detection  of  4  outliers 


Figure  6.5:  Detection  of  Multiple  Outliers 
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Chapter  7 

Revisit  Dynamic  ARIMA-Based  Anomaly 
Detection 


A  detailed  application  of  RGLRT  is  given  out  in  this  chapter.  The  time  series  model  of  Au¬ 
toregressive  Integrated  Moving  Average  (ARIMA)  progress,  finds  its  wide  usage  in  natural,  social, 
economic  and  network  applications.  Model  building  and  anomaly  detection  based  on  such  mod¬ 
els  are  often  a  first  and  important  step  towards  monitoring  unexpected  problems  and  assuring  the 
soundness  and  security  of  those  systems  being  studied.  The  time  variability  by  the  coefficients  in 
those  dynamic  regression  models  is  particularly  relevant  and  possibly  indicative.  Thus  we  intro¬ 
duce  a  corresponding  framework  and  a  novel  anomaly  detection  approach  based  on  the  Kalman 
filter  for  identifying  those  dynamic  models  including  their  parameters  and  a  General  Likelihood 
Ratio  (GLR)  test  for  detecting  suspicious  changes  in  the  parameters  and  therefore  the  models.  We 
illustrate  the  idea  through  experiments  and  show  its  promising  potential  in  terms  of  accuracy  and 
robustness. 

The  most  popular  time  series  technique  is  the  Autoregressive  Integrated  Moving  Average 
(ARIMA)  [37,  106,  36,  39]  model  due  to  its  versatility  in  capturing  dynamics  and  forecasting 
predictions.  In  light  that  model  building  lays  the  foundation  for  anomaly  detection  [158],  conse¬ 
quently  a  fair  share  of  the  work  on  machine  learning,  signal  processing  and  time-series  analysis 
is  devoted  to  detecting  outliers  or  anomalies  in  time-series  and  ARIMA  to  be  specific  [237].  The 
existence  of  anomalies  in  ARIMA  models  and  their  detection  arise  in  a  variety  of  settings  in¬ 
cluding  but  not  limited  to  natural  [108,  184],  social  [63,  273],  economic  [197,  73,  163,  8]  and 
network  service  [281,  288]  and  network  security  [151,  291,  284,  91,  231]  applications.  The  time 
varying  structural  parameters  not  only  possibly  challenge  the  model  fidelity  [264]  thus  undermine 
the  intended  effectiveness  of  its  usage  but  also  likely  reflect  the  intrinsic  nature  of  the  system  that 
evolves  over  time  [203].  More  specifically,  any  sudden  change  of  these  parameters  is  an  indication 
of  some  atypical  behavior  within  the  system  including  benign  faults  [25]  and/or  malicious  attacks 
[131].  In  particular,  in  the  arena  of  network  security,  network  traffic  anomalies  may  occur  due  to 
security  threats  such  as  Distributed  Denial  of  Service  (DDoS)  attacks  and  network  worms. 

The  work  on  network  anomogrphy  [291]  by  Zhang  el  al.  inspired  our  extension.  According 
to  their  investigation,  one  of  the  most  successful  and  robust  methods  in  detecting  network  traffic 
anomalies  combining  Box- Jenkins  modeling  (ARIMA)  with  L\  norm  minimization. 
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CUSUM  (Cumulative  Summation)  method  and  its  variants  are  widely  used  for  anomaly  de¬ 
tection.  As  pointed  out  in  [25,  254],  its  major  drawback  is  that  it  requires  a  prior  knowledge  on 
information  after  change,  i.e.  the  intensity  of  the  anomaly  etc.  But  in  practice,  such  information 
are  not  predicable. 

We  look  at  the  problem  through  a  novel  angle  and  take  advantage  of  the  by-product  due  to 
the  parameter  learning  and  estimation  process  in  the  ARIMA  model  building  stage  to  pre-screen 
possible  anomalies  without  incurring  extra  drastic  computation  burden.  It  also  prevents  those 
anomalies  from  poisoning  the  correct  model-  and  baseline-building  from  the  start. 

Our  goal  is  to  find  a  quick  way  to  detect  such  anomalies  manifested  in  the  form  of  change  in  the 
system  model.  The  identification  and  estimation  of  ARIMA  models’  parameters  is  often  the  first 
step  before  any  further  analysis  and  often  can  be  achieved  through  maximum  likelihood  estima¬ 
tion.  The  exact  likelihood  is  computed  via  a  state-space  representation  of  the  ARIMA  process,  and 
the  innovations  and  their  variance  found  by  a  Kalman  filter  [139].  We  use  a  General  Likelihood 
Ratio  (GLR)  test  [277,  25],  which  doesn’t  require  any  a  prior  knowledge  of  the  anomalies,  for 
detecting  suspicious  changes  in  the  parameters  and  therefore  the  models.  Along  with  the  Kalman 
filter  [139],  this  GLR  procedure  also  adaptively  filters  the  ARIMA  parameter  estimation  in  case 
of  missing  anomalous  observations. 

Organization  of  the  paper:  We  first  review  the  procedure  of  ARIMA-based  anomaly  detection 
in  Section  2  with  emphasis  on  the  model-building  and  its  transition  to  a  state  space  model  in  which 
the  Kalman  filter  that  facilities  model  estimation  and  anomaly  detection.  In  Section  3  we  describe 
the  GLR  test  for  identifying  sudden  change  in  dynamic  ARIMA  model.  Then  we  illustrate  the 
idea  through  simulation  experiments  in  Section  4  before  conclude  in  Section  5. 


7.1  ARIMA  Modeling 

While  we  address  the  derivation  of  model-building  through  a  concrete  example  of  anomaly 
detection  on  the  network  level,  it’s  worth  pointing  the  methodology  is  applicable  to  other  situa¬ 
tions. 

The  link  traffic  and  Origin-Destination  (OD)  traffic  matrix  follow 

bj=Ajxj  (7.1) 

where  A  j  is  an  n  x  m  routing  matrix,  xj  is  a  length-n  vector  of  unknown  OD  flow  traffic  volumes, 
and  bj  is  a  length-m  vector  of  link  loads1 ,  at  time  interval  j. 

If  we  first  assume  that  the  routing  matrices  Aj  are  time-invariant  and  are  denoted  by  A.  Then 
we  can  combine  all  t  linear  systems  (7.1)  into  a  single  equation 

B  —  AX,  (7.2) 

where  B  —  [b\b2  ■  ■  ■  ,bt\  is  link  traffic  data  over  time  t  by  having  b  j  as  its  column  vectors,  and 
similarly  X  =  [x\xn  ■■  ■  ,xt\. 

'Note  that  the  link  load  vector  bj  also  includes  the  aggregated  traffic  at  different  ingress/egress  points;  the  corre¬ 
sponding  rows  in  Aj  encode  the  OD  flows  that  enter/exit  the  network  at  these  points. 


65 


In  the  notation  introduced  by  Box  and  Jenkins  [37],  models  are  summarized  as  ARIMA(/x  d.  q) . 
A  model  described  as  ARIMA(0,  1,  2)  means  that  it  contains  p  =  0  (zero)  autoregressive  param¬ 
eters  and  q  =  2  moving- average  parameters  which  were  computed  for  the  time  series  after  it  was 
differenced  once  (d  =  1). 


7.1.1  Time  Series  Expression 

A  general  ARIMA  model  of  order  (p,  d,  q)  can  be  expressed  as: 

P  9 

~k  ^  tyi Zk—i  —  ^  9  jZk—j  (2 -3) 

i=  1  7=1 


where  Zk  is  obtained  by  differencing  the  original  time  series  d  times  (when  d  >  1)  or  by  subtracting 
the  mean  from  the  original  time  series  (when  d  =  0),  is  the  forecast  error  at  time  k,  ([),(/  = 
1  ,...,/>)  and  Qj(j  =  \,....q)  are  the  autoregression  and  movingaverage  coefficients,  respectively. 
Let  I  denote  the  t  x  t  identity  matrix,  V  denote  the  backshift  matrix  and  1  denote  the  /  x  t  unity 
matrix  with  each  entry  =  1 . 

{B(i~vy  v  = 

E  —  BT,  where  the  transformation  matrix  (7.5) 

T  =  (7.6) 

(/  -  v)^/  -  Ef=1  ^v‘)(/  -  £?,,  d  >  l 
(/-71)(/-Ef,iivi)(/-E;=10jV-')  rf  =  o 

In  terms  of  the  classical  ARIMA  techniques  used  for  anomaly  detection  ,  the  forecast  errors 
indicate  anomalous  link  traffic,  B  —  E.  That  is,  traffic  behavior  that  cannot  be  well  captured  by 
the  model  is  considered  anomalous. 


d>  1 


d  =  0 


(7.4) 


7.1.2  State-Space  Representation 

The  discrete  time  linear  dynamical  system  and  measurement  model  are  the  following,  where  i 
is  the  index  of  sensors. 


xt+i  =  AfXt+Wt  (7.7) 

}’t  =  Ctxt  +  vt  (7.8) 

where  xt  G  93s  is  the  state  vector,  y,  G  93°  is  the  output  vector,  wt  G  91s  is  white  Gaussian  noise 
with  zero  mean  and  covariance  Q  >  0  and  vf’s  G  91°  are  white  Gaussian  noises  with  covariance 
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Rt  >  0.  wt  and  vr’s  are  independent.  The  initial  system  state  xq  is  Gaussian  with  zero  mean  and 
covariance  £0.  We  assume  xq  is  independent  of  wt  and  vf’s. 

7.1.3  The  ARIMA(p,d,q)  Process  in  a  State-Space  Model 

Harvey  and  Pierse  [107]  derive  a  state-space  representation  of  a  general  ARIMA(p,  d.  q )  model 
with  backshift  operator  L  to  denote  the  effect  of  (Lz)k  —  Zk-  t,  then 

§{L)Adyt  =  \]/(L)ef 

Let  r  —  max(p,q+  1),  the  state  transition  equation  can  be  written  as  a  (r  +  d)  x  1  system 


xf  =  Axf_i  +  Bet  (7.9) 


■ 

Orxrf 

l  l 

o 

o 

o 

8-8 

Xf-l  + 

Orf— lxr 

Id- 1  :  0  . 
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.  4>r 
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0 

0  . 

0r— 1 

and  —  8/  is  the  coefficient  on  U  in  the  expansion  of  Ad  —  (1  —  L)d .  This  state  space  representation 
has  p  +  q+l  hyperparameters  and  a  measurement  equation  given  by 


}’t  =  Cx, 

=  [10lx,-_l8l  •  ■  ■  §d\xt 


(7.10) 

(7.11) 


7.1.4  Kalman  Filter  based  Exact  Maximum  Likelihood  Estimation  of  ARIMA 

The  Kalman  filter  [139]  is  a  recursive  algorithm  for  generating  Minimum  Mean  Square  Error 
(MMSE)  predictions  in  a  state  space  model.  The  state  space  representation  is  a  very  general 
formulation  for  linear  models  and  it  enables  the  Kalman  filter  to  deal  with  time  varying  parameters, 
measurement  errors  and  missing  observations  easily.  As  a  by-product,  if  Gaussian  errors  are 
assumed,  the  filter  allows  the  computation  of  the  log-likelihood  function  of  the  state  space  model. 
This  allows  the  model  parameters  to  be  easily  estimated  by  maximum  likelihood  methods. 
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Standard  Kalman  Filter 

xt\t  =  E[*,|yt] 

pt\t  =  E[(xt-xtlt)(xt-xtlt)'\yt] 
xt+i\ t  =  E[*,+i|yt] 

pt+i\t  =  mxt+i-xt+i\t)(xt+1-xt+l\t)'\yt} 

$t+l\t  =  E[yf+]  |yt]. 

where  Pt+i\t  is  the  covariance  matrix  of  the  estimation. 

The  Kalman  filter  comprises  two  steps. 

The  prediction  phase  for  xt+\\t  and  Pt+\\t  of  the  Kalman  filter  is  independent  of  the  observa¬ 
tion  process  with  : 

xt+x\t  =  Axt \t  (7.12) 

P,+I\t  =  APt\A  +  Q  (7.13) 

For  the  update  phase  of  the  Kalman  filter,  given  the  residual  or  prediction  error 

et=yt+\-Cxt+x\t  (7.14) 

and  its  estimated  variance 

Ft  =  CtPt+l\tCt  +Rt  (7.15) 

=  *t+\\t  ~PPf+\\tC  pt 
(yt+i  -  Cxt+llt) 

pt+i\t+ 1  —  APtA  +Q  —  pt+i\tC  Ft 
Cpt+l\t 

7.1.5  The  Log-likelihood  function 

Assuming  that  the  noises  are  normally  distributed,  the  log-likelihood  function  for  the  model 
can  be  computed  from  the  residual,  prediction  error  et  and  its  associated  variance  Ft 

nT  1  T 

LL  =  — —  log(27tG2)--£log|iy| 

z  1 1= l 

t= 1 


(7.16) 

(7.17) 


(7.18) 
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Due  to  the  fact  that 


T 


da2  2  a2  2a4 


t= l 


=  0, 


we  have 


Thus  the  concentrated  log-likelihood  function  of  the  model  can  be  maximized  with  respect  to 
(4>,  0)  to  find  the  Maximum  Likelihood  Estimate  (MLE)  of  the  hyperparameter  0 


n 


LL*(  <|>,0)  =  /7log5((^,0)  +  £log/f 


(7.19) 


nT  .  .  nT  1  X  . 

=  —  l°g(27c)  -  -  log  \Ft 


(7.20) 


Smoothing.  Based  on  all  information  available  up  to  time  t  —  1,  the  Kalman  filter  can  function 
as  a  smoother  with  above  mentioned  recursions  work  backwards  in  time  to  smooth  the  regression 
model  [106]. 

7.1.6  Identification  of  ARIMA  and  Model  Estimation 

Let  I  be  the  set  of  indices  corresponding  to  all  the  ingress  points  in  the  link  load  vectors  ht. 
The  series  of  subvectors  b\  will  be  the  input  data  for  model  selection  and  parameter  estimation  2. 

2Note  this  choice  is  due  to  their  ready  avilability  and  the  fact  that  ingress  traffic  is  largely  invariant  to  internal 
topology  and  routing  changes. 
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Choice  of  the  degree  of  differencing  d* 

Given  that  the  optimal  degree  of  differencing  is  often  the  one  at  which  the  standard  deviation 
of  the  differenced  series  is  the  lowest  [60],  we  carry  out  the  following  steps  \/d  G  {0, 1,2, 3, 4} 


=  M_i(i 

(7.21) 

E[Zd] 

1  w 

—  “L  Zd 3 

1  i=  1 

(7.22) 

Var[Zd ] 

1  i=  1 

=  argmin  Var\Zd\ 

(7.23) 

then  d * 

(7.24) 

d 


Estimate  4>  and  0  given  ( d *) 

Provided  (p,  d.  q)  and  input  vector  series  { b[ } ,  we  can  estimate  the  autoregression  and  moving- 
average  coefficients  (j);  and  07  by  constructing  a  state-space  model  as  (7.10)  in  Section  7.1.3  and 
then  applying  the  Kalman  filter  procedure  as  in  Section  7.1.4  to  compute  the  maximum  log- 
likelihood  function  LL*(ty,Q)  (7.20  )  for  each  (p,q)  GO,  1,2, 3,4. 

Selection  on  Model  Order  (p,q) 

Information  based  criteria  are  designed  to  achieve  a  good  balance  between  model  parsimony 
and  low  prediction  error  [39,  60]  such  as  Akaikefor  Information  Criterion  (AIC)  or  Bayesian 
information  criterion  (BIC).  we  use  AIC  as  our  model  selection  criterion,  which  generally  is 

AIC  =  2fc  — 21n(LL*((j),0))  (7.25) 

where  k  is  the  number  of  parameters  in  the  statistical  model,  and  LL*  ((f),  0)  is  the  maximized  value 
of  the  likelihood  function  for  the  estimated  model  (7.20  ).  For  each  (p,q)  G  0, 1 , 2, 3, 4  we  estimate 
(f)  and  0  (as  in  Section  7.1.6)  and  compute  the  resulting  AIC  based  on  the  residuals  and  the  model 
complexity.  We  then  choose  the  pair  of  (p,  q)  with  the  lowest  AIC. 

( p,q )*  =  argmin  AIC  (7.26) 

(p,q)e  0,1, 2, 3, 4 

7.2  Generalized  Likelihood  Ratio  Test  for  Identifying  Sudden 
Change  in  Dynamic  ARIMA  Model 

Willsky  and  Jones  (1976)  [277]  introduced  the  Window-limited  GLR  rules  in  the  context  of  de¬ 
tecting  abrupt  additive  system  changes  in  linear  state-space  models.  Such  abstract  system  changes 
may  occur  due  to  benign  environmental  changes  or  unintentional  system  component  faults  or  ma¬ 
licious  activities.  The  idea  is  to  implement  a  Kalman  filter  based  on  the  assumption  of  no  abrupt 
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system  changes,  and  to  monitor  the  measurement  residuals  of  the  filter  to  determine  if  a  change 
has  occurred  and  adjusts  the  filter  accordingly. 

Recall  the  state-space  stochastic  linear  dynamical  system  (7.7)  and  measurement  model  (7.8) 
in  Section  7.1.2,  if  at  an  unknown  time  x  the  system  undergoes  additive  changes  in  the  sense  that 
u't  l{f<t}  is  added  to  the  right-hand  side  of  (7.7),  i.e. 

xt+i  =  Atxt  +wt  +  u[  1  p<T} 

then  the  innovations  are  still  independent  Gaussian  vectors  with  covariance  matrices  Ft,  but  their 
means  mt  —  E((e))  —  p(t,x)r|  for  t  >  x  instead  of  the  baseline  values  mt  —  0  for  t  <  x.  After  the 
initialization  of  their  associated  p (k,k)  =  0,  a (k,k)  —  0,  |3(fc  —  1  ,k)  =0,  the  matrices  p (t,k)  can 
be  evaluated  recursively  for  t  >  k  through  the  following  steps: 

0c(?  — I-  X ,  A:)  =  Aka(t,k)  +1  (7.27) 

W,k)  =  Ak_$(t-\,k)+PAt_xCTkF^p{t,k) 
p(t+l,k)  =  Ct+i(a(t  +  l,k) -AfP(t,fc)) 

7.2.1  Detection  Rules 

Without  assuming  any  prior  knowledge  of  parameter  T),  the  GLR  rule  maximizes  the  log  likeli¬ 
hood  ratio  over  a  window  of  inputs  and  decide  the  time  to  raise  an  alarm  according  to  the  following 
rule, 


Nc 


infjn:  max  sup  £  log  [/(^  1/2 

n-M<t<n-M'  r|  “ 

x  (, ef  -p(i,t)r\))/f(Fr1/2ei)\  >  cx} 


inf \n:  max  (V  p T (iJ)F-  le;)T 

1  n—m<t<n—m'  “j. 
i=k 

•(£p  T(i,t)Frlei)/2>cx} 

i=k 


(7.28) 


where  f(y)  —  e^~  / (27X) ^/2  denotes  the  ^-dimensional  normal  density,  C,  —  dim( T]),  and  m'  +  1  > 
C,  so  that  the  matrix  inversions  in  (7.28)  are  valid. 

In  essence,  we  are  looking  at  an  optimal  stopping  time  problemmot  to  stop  too  early  to 
produce  a  false  alarm  nor  to  stop  too  late  to  miss  a  real  anomalous  event. 
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7.2.2  Threshold  and  Window  size  Choice 

Note  that  (7.27)  computes  p (t,k)  recursively  over  the  each  window.  How  to  optimally  choose 
M,M  and  c\  in  general  is  a  difficult  problem  [25]  for  online  practices  particularly  due  to  the 
coupling  effect  between  the  threshold  and  window  size  on  the  asymptotical  performance  of  the 
detection  rule.  But  for  off-line  operations,  the  choice  of  window  size  is  less  demanding  as  all  the 
data  set  is  available,  it’s  only  a  matter  of  computation  time. 

The  threshold  c  in  the  rule  Nw  subject  to  the  false  alarm  probability  criterion  Pq(Nw  <  m) 
can  be  computed  by  using  Monte  Carlo  computation  of  Pq(Nw  together  with  the  method  of  suc¬ 
cessive  linear  approximation  combined  with  bisection  search  for  iterative  solution  of  the  equation 
Pq(Nw  <  m). 


7.3  Experiments 

Given  that  ARIMA  data  sets  share  the  commonality  in  the  perspectives  of  basic  model  char¬ 
acteristics  and  in  the  interest  of  time  and  access,  at  current  stage  we’ve  used  two  small  publicly 
available  ARIMA  time  series  datasets  [53,  57]  besides  simulation  data  and  synthetic  anomaly 
generation  to  test  our  method. 

In  order  to  broaden  the  scope  of  anomalies,  we  inject  synthetic  ones  into  the  data  set  in  a 
fashion  similar  to  [254]. 

•  By  smoothing  the  original  signal,  we  extract  the  long-term  statistical  trend  from  the  data  set. 

•  Add  Gaussian  noise  to  the  smoothed  signal. 

•  Add  different  anomaly  combinations  in  terms  of  number,  time,  strength. 

As  shown  in  Figure  7.1,  the  synthetic  dataset  captures  the  trend  in  the  original  dataset  and 
provides  the  simulation  with  more  plausibility. 

7.3.1  Detection  Rates 

For  the  real  ARIMA  dataset,  we  adjust  the  portion  of  the  dataset  being  investigated  by  the  de¬ 
tection  algorithm  as  a  way  to  control  the  occurrence  of  the  anomalies.  Whereas  for  the  synthetic 
dataset,  the  number  or  size  of  the  anomalies  is  easily  controlled  by  the  dosage  of  artificial  anoma¬ 
lies  that  we  inject  into  the  synthetic  dataset.  Note  that  the  synesthetic  basically  is  considered 
anomaly  free  before  any  injection  as  it’s  a  product  of  smoothing  and  de-noising  of  the  original 
dataset.  When  using  the  synthetic  dataset,  each  result  is  based  on  1000  simulations. 

Sensitivity  to  Window  Size  Although  theoretically  all  window  sizes  can  be  computed  pre¬ 
cisely,  we  still  would  like  to  observe  how  they  affect  the  performance  of  detection.  Without  an¬ 
alytically  specify  a  precise  window  size  to  achieve  the  asymptotical  optimality,  there’s  a  tradeoff 
between  the  window  size  and  the  detection  sensitivity.  When  window  size  is  too  long,  the  recur¬ 
sive  Kalman  filtering  itself  may  graduate  smooth  out  the  edginess  of  the  anomaly.  While  window 
size  is  too  small,  the  maximization  requirement  associated  with  the  general  likelihood  may  be  met 
less  than  sufficiency. 
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Figure  7.1:  Steps  for  synthetic  generation  of  anomaly  where  the  last  panel  is  the  synthetic  data 
with  anomaly  injected  at  time  period  from  60  to  65. 
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Figure  7.2:  Detection  Rate  (with  different  window  size)  in  response  to  the  anomaly  size  N 


Note  for  the  synthetic  dataset  shown  in  Figure  7.2,  when  the  anomaly  size  is  4,  the  detection 
performance  seems  to  downgrade  quite  a  bit.  The  likely  explanation  is  that  we  lump  3  anomalies 
close  together  while  keep  them  quite  separate  in  other  size  cases. 

Sensitivity  to  Threshold 

Similarly,  it’s  interesting  to  verify  how  sensitive  the  detection  rate  can  be  under  the  influence  of 
the  threshold  chosen  for  the  detection  rule.  As  shown  in  Figure  7.3,  we  pick  an  arbitrary  threshold 


73 


Anomaly  N  in  Real  Data 


Anomaly  N  in  Synthetic  Data 


Figure  7.3:  Detection  Rate  (with  different  threshold)  in  response  to  the  anomaly  size  N 

at  8  to  contrast  the  other  two  cases,  of  which  5.5  is  the  value  calculated  through  formal  derivation 
and  the  same  value  used  for  testing  on  detection  sensitivity  to  window  size  in  Section  7.3.1.  When 
threshold  is  too  high  (as  the  case  of  8  here),  so  would  miss  detection  rate.  Note  that  for  the  case  of 
3  anomalies  close  together,  it  somehow  made  the  high  threshold  case  work  better  on  the  synthetic 
dataset  than  on  the  real  dataset  where  the  3  anomalies  are  rather  isolated.  Also  when  the  threshold 
is  too  low  (as  the  case  of  3  here),  so  would  false  alarm  rate. 

7.3.2  Detection  Delay 

Obviously  our  method  has  at  least  minimum  window-length  delay  in  issuing  in  alarms.  This  is 
due  to  the  fact  that  at  every  time  step,  it  requires  a  maximization  over  window-length  data  points 
in  order  to  calculate  the  generalized  likelihood  in  exchange  for  not  demanding  for  any  a  priori 
knowledge  of  the  potential  anomalies. 

Sensitivity  to  Anomaly  Strength:  When  using  the  synthetic  dataset  with  injected  anomalies, 
we  notice  that  the  proposed  Kalman-GLR  scheme  has  longer  mean  detection  delay  (and  is  more 
prone  to  false  alarms  when  detect  anomalies  using  smaller  threshold).  In  Figure  7.4,  the  mean 
delay  time  beyond  100  means  it’s  in  fact  a  miss  detection  as  the  magnitude  of  the  anomaly  is  to 
weak  to  be  detected. 


7.4  Discussion 

In  this  chapter,  we  describe  the  comprehensive  procedure  of  building  an  ARIMA  model  and 
propose  to  identify  anomalies  during  the  process  of  model  parameter  estimation  with  the  aid  from 
the  Kalman  filter  and  GLR  test.  This  approach  also  prevents  such  anomalies  from  poisoning  the 
baseline -building. 

Next  step  we  plan  to  test  out  the  robust  methodology  developed  in  [308].  Furthermore  we’d 
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Figure  7.4:  Mean  Detection  Delay  (under  different  threshold)  in  response  to  the  anomaly  size  N 


like  to  apply  our  method  to  traffic  data  collected  from  the  Abilene  network  [1]  to  study  towards 
the  simplification  of  threshold-setting. 
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Chapter  8 

Anomaly  Detection  for  Clean  Energy 
Resources  Prediction  and  Power 
Consumption  Forecast  in  the  Smart  Grid 


A  tale  of  two  cities 


This  chapter  shows  further  development  of  RLRT  and  its  application  that  is  closely  related  to 
anomaly  detection  SCADA  systems  and  smart  grids,  i.e.  anomaly  detection  for  both  clean  energy 
resources  prediction  and  power  consumption  forecase  [303]  .  The  advancement  in  computing  and 
hardware  technologies  ushers  in  a  new  era.  While  the  utilization  of  clean  energy  resources  includ¬ 
ing  wind  and  solar  power  sets  to  grow  from  filling  the  gap  of  peak  hours  to  taking  a  larger  share  in 
the  upcoming  smart  grid  and  efficient  infrastructure,  the  price-incentivized  electricity  consump¬ 
tion  shall  alleviate  peak  hours  and  reduce  power  outages.  But  anomalies  including  both  benign 
faults  and  malicious  attacks  threat  the  reliability  and  availability  of  the  new  grid.  To  address  these 
duo  problems,  we  aim  from  the  angle  of  one  fundamental  technique  used.  The  Autoregressive  In¬ 
tegrated  Moving  Average  (ARIMA)  time  series  models  play  roles  at  both  ends  in  this  new  ecosys¬ 
tem:  namely,  predicting  the  variable  clean  energy  resource  on  the  supply  side  and  forecasting  the 
flexible  load  demand  on  the  consume  side.  Model  construction  and  anomaly  detection  based  on 
such  models  are  often  a  first  and  important  step  towards  monitoring  unexpected  problems  and 
assuring  the  soundness  and  security  of  those  systems  being  studied.  The  time  variability  of  the 
coefficients  in  those  dynamic  regression  models  is  particularly  relevant  and  possibly  indicative. 
Thus  we  introduce  a  corresponding  framework  and  a  novel  anomaly  detection  approach  based  on 
a  robustified  Kalman  Filter  for  identifying  those  dynamic  models  including  their  parameters  and  a 
Robust  General  Likelihood  Ratio  (RGLR)  test  for  detecting  suspicious  changes  in  the  parameters 
and  therefore  the  models.  Currently,  the  effectiveness  and  robustness  of  this  method  is  shown 
through  simulation.  At  two  ends  of  the  smart  grid,  both  the  clean  energy  resource  supply  and 
electricity  power  consumption  require  reliable  and  accurate  predication. 

Variable  Clean  Energy  Resources  Prediction  With  the  integration  of  clean  energy  into  elec¬ 
tricity  grids,  it  is  becoming  increasingly  important  to  obtain  accurate  forecasts.  Advancements  in 
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wind  and  solar  forecasting  technology  aim  to  make  renewable  energy  reliability  a  reality.  In  par¬ 
ticular,  due  to  its  versatility,  building  and  applying  the  Autoregressive  Integrated  Moving  Average 
(ARIMA)  time  series  model  enjoys  its  popularity  among  industrial  and  engineering  applications 
such  as  wind  power,  solar  energy  level  prediction  and  power  grid  load  forecasting  [113],  [84].  For 
example,  Kavasseri  et  al  studied  day-ahead  wind  speed  forecasting  using  f-ARIMA  models  [141], 
Nielsen  et  al  built  a  wind  power  prediction  system  that  is  based  on  ARIMA  [200],  [199];  Makarov 
et  al  from  California  Independent  System  Operator  (ISO)  wind  generation  and  forecasting  service 
deemed  ARIMA  as  the  persistence  models  suitable  for  the  short  term  wing  generation  forecasting 
and  real-time  dispatch  in  the  Grid  Control  Centers  [170];  Milligan  et  al  applied  ARIMA  models  to 
both  wind  speed  and  wind  power  output  [184].  For  a  more  comprehensive  and  state-of-art  survey 
on  short-term  prediction  of  wind  power,  interested  readers  please  refer  to  [84], 

ARIMA  models  also  suit  the  needs  of  the  demand  side  of  smart  grid. 

Flexible  Smart  Grid  Load  Demand  Forecast  In  general,  ARIMA  models  address  well  the  is¬ 
sue  of  high  level  short-term  hourly  load  forecasting  in  traditional  power  grids  [10].  Furthermore, 
ARIMA  modeling  techniques  show  their  prowess  in  capturing  the  flexible  and  price-sensitive 
short-term  hourly  overall  load  demand  response  enabled  by  the  deployment  of  smart  grid  [55]. 
Given  that  one  of  the  key  drivers  of  the  deployment  of  smart  grid,  buildings  consume  approxi¬ 
mately  73%  of  the  total  electrical  energy  in  the  United  States  [145],  it’s  efficient  to  monitor  down 
to  the  building-level  electricity  consumption.  ARIMA  models  have  been  applied  to  building- 
related  applications  ranging  from  modeling  building  electricity  consumption  [198]  and  forecast¬ 
ing  and  controlling  the  peak  demand  in  commercial  buildings  [114],  to  optimizing  the  operation 
of  cold  storage  in  a  large  building  [146]. 

The  ubiquitous  integration  of  computers  in  the  smart  grid  -  in  the  generation,  transmission, 
distribution  and  metering  in  homes  also  introduces  malicious  security  risks  besides  benign  faults 
throughout  the  system  [143],  [68].  Stuxnet  [70],  one  of  most  sophisticated  control  system  mal¬ 
ware  known  to  date,  has  become  the  game  changer  in  the  field,  in  terms  of  demonstrating  the 
severity  and  therefore  raising  people’s  awareness  of  such  issues  1  [274]  as  described  by  Falliere 
et.  al  at  Symantec  [70],  As  of  April  21st.  2011,  There  are  more  than  50  new  Stuxnet-like  attacks 
discovered  [194]  that  beckon  threats  to  the  Supervisory  Control  and  Data  Acquisition  SCADA, 
the  underlying  control  system  of  the  smart  grid.  The  resources  of  vulnerabilities  can  be  generic 
and  board.  Thus  our  fault  and  threat  model  is  impact-oriented.  We  analyze  the  consequence  of 
their  occurrences  manifested  in  the  data  that  would  sway  the  model  construction  of  both  the  clean 
energy  resource  supply  and  power  consumption  forecast  without  excluding  the  cases  where  the 
adversaries  purposely  poisoning  the  model  construction. 

The  idea  of  ARIMA-based  anomaly  detection  is  based  on  whether  the  data  deviate  afar  from 
the  model  predication.  Thus  the  accuracy  of  the  model  construction  itself  is  important. 

Alternatively,  CUSUM  (Cumulative  Summation)  method  and  its  variants  are  widely  used  for 
anomaly  detection.  As  pointed  out  in  [25], [254],  its  major  drawback  is  that  it  requires  a  priori 
knowledge  on  information  after  change,  i.e.  the  intensity  of  the  anomaly  etc.  But  in  practice,  such 
information  are  not  predicable. 

We  look  the  problem  through  a  novel  angle  and  take  advantage  the  by-product  due  to  the  pa- 

1  In  McAfee’s  report  [18],  nearly  half  of  those  being  surveyed  in  the  electric  industry  said  that  they  had  found 
Stuxnet  on  their  systems. 
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rameter  learning  and  estimation  process  in  the  ARIMA  model  building  stage  to  pre-screen  possible 
anomalies  without  incurring  extra  drastic  computation  burden.  It  also  prevents  those  anomalies 
from  poisoning  the  correct  model-  and  baseline-building  from  the  start.  We  take  precaution  of 
the  skewing  and  deviating  effect  of  outliers  on  identifying  procedures  by  applying  robustifying 
measures  and  integrating  a  recursive  variant  of  the  M-estimator,  a  Huber  function  [119],  into  the 
Kalman  filter  [139]  via  an  recursively  reweighted  least  squares  implementation.  Our  Robust  Gen¬ 
eral  Likelihood  Ratio  test  rectifies  and  cleans  data  upon  both  isolated  and  patchy  outliers  while 
maintain  the  optimality  of  the  Kalman  Filter  under  the  nominal  condition.  Furthermore  it  can 
be  theoretically  shown  that  our  procedures  are  of  the  quickest  and  optimal  detection  thus  we  can 
achieve  the  goal  of  ‘nipping  it  in  the  bud’ .  The  robust  sequential  testing  bears  optimal  stopping 
time,  i.e.  asymptotically  shortest  detection  delay  time  while  maintaining  lowest  false  alarm  rate. 
For  the  interest  of  briefness,  readers  can  refer  to  Chapter  6  and  Chapter  7  for  more  details. 


8.1  Experiments 

8.1.1  Data  Sets  -  Real  Wind  Power  Data 

The  Transmission  Expansion  Planning  Policy  Committee  (TEPPC)  of  the  Western  Electricity 
Coordinating  Council  (WECC)  provided  us  with  wind  power  data.  Particularly,  we  use  its  CA2 
location  profile  ~A2  includes  Westwind,  Antelope  and  other  substations  in  California)  with  3570 
MW  capacity  as  of  2006,  as  shown  in  Fig.  8.1. 

It’s  easy  to  identify  that  the  difference  order  d  is  1  as  visually  its  autocorrelation  plot  shown  in 
Figure  8.2. 

Due  to  the  non-stationarity  in  the  raw  data  series,  its  mean  and  variance  diverge  as  time  pro¬ 
ceeds. 

8.1.2  Simulated  Data 

In  order  to  illustrate  the  idea  of  the  commonality  shared  by  both  the  variable  clean  energy  and 
power  consumption  in  the  perspectives  of  basic  model  characteristics  and  in  the  interest  of  time 
and  access,  without  loss  of  generality,  we  decide  to  employ  a  simulated  ARIMA  data  set  as  shown 
in  Figure  8.3. 

8.1.3  Fogies  Attack 

An  attacker  can  manipulate  the  data  through  means  such  as  protocol  defects,  social  engineer¬ 
ing,  man-in-the-middle  attacks  etc.  SCADA  and  smart  grid  specific  attacks  [296]  to  accomplish 
their  goals. 

Random  outliers  are  injected  into  the  data  set  randomly  to  capture  this  effect  as  shown  8.4. 

2C 
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Figure  8.1:  Wind  Power  Hourly  Measurements:  (Up)  2006  Whole  Year,  (Bottom)  10  days  of 
Midsummer  2006. 


Figure  8.2:  The  Autocorrelation  Plot 
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Figure  8.3:  Simulated  ARIMA  Data:  (Up)  One  Year,  (Bottom)  10  days  of  Midsummer  . 


Figure  8.4:  Simulated  ARIMA  Data:  (Up)  10  days  of  Midsummer,  (Bottom)  With  Outliers  . 
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8.1.4  Countermeasure  strategy  -  Parry 

In  light  of  the  stealthiness  of  Stuxnet  and  the  long-term  hazard  of  a  deviated  baseline  launched 
by  likely  furtive  attackers,  the  main  of  our  work  can  serve  as  a  prevention  measure  in  the  sense 
that  we  take  precaution  during  the  model-building  stage  to  prevent  attackers  from  landing  their 
intrusions  earlier  on3  . 

Given  that  ARIMA  data  sets  share  commonality  in  the  perspectives  of  basic  model  character¬ 
istics  and  in  the  interest  of  time  and  access,  at  the  current  stage  we’ve  used  two  small  publicly 
available  ARIMA  time  series  datasets  [53,  57]  besides  simulation  data  and  synthetic  anomaly 
generation  to  test  our  method. 

With  randomly  injected  outliers  where  the  false  alarm  constraint  is  achieved  through  Monte 
Carlo  simulation,  our  approach  successfully  detects  them. 

8.1.5  Performance  Analysis 

Comparison  with  GLR 

Given  that  GLR  is  based  on  the  standard  Kalman  filter,  assuming  the  dynamics  after  change 
also  follow  Gaussian.  G1R  doesn’t  function  well  at  all  when  outliers  are  injected  into  the  raw  data 
sequence. 


8.2  Discussion 

With  the  ever  rising  demand  of  clean  energy  and  fast  increasing  deployment  of  smart  grid  on 
the  horizon,  the  generic  nature  of  this  study  and  investigation  shows  a  promising  utility  in  proac¬ 
tively  suggesting  a  feasible  solution  to  anomaly  detection  including  benign  faults  and  malicious 
attacks  for  both  variable  clean  energy  resource  supply  and  flexible  power  consumption.  Next 
step  we  plan  to  apply  it  to  real  wind  data  in  conjunction  with  simulated  user  demand  sensitive  to 
pricing. 


3  In  fencing,  the  primary  function  of  a  parry  is  to  prevent  an  opponent’s  attack  from  landing 
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Chapter  9 

Conclusion  and  Future  Plans 


In  this  dissertation,  the  landscape  of  cyber  attacks  and  intrusion  detection  systems  for  SCADA 
systems  has  been  clearly  outlined.  As  an  initial  effort,  an  in-depth  SCADA-specific  security  solu¬ 
tion  Xware  is  proposed.  A  versatile  early  detection  scheme  RGLRT  along  with  resilient  estimation 
approach  shows  its  effectiveness  in  detecting  anomalies. 


9.1  RGLRT 

The  strength  of  RGLRT  lies  in  that  it  does  not  require  a  priori  knowledge  of  the  distributions 
of  the  attacks  or  benign  anomalies,  i.e.,  neither  their  mean  nor  their  variance,  which  is  a  clear 
advantage  against  SPRT  in  real  life.  Furthermore  its  close  relation  with  the  state  space  setting  and 
the  Kalman  filter  gives  it  a  special  advantage  against  non-parametric  CUSUM  in  the  engineering 
field.  I’ve  explored  two  main  types  of  its  application,  namely 

•  to  detect  outliers  and  anomalies  through  measurements  in  the  Kalman  filter  when  the  latter 
is  used  for  predication  and  estimation  of  a  dynamical  model ; 

•  to  detect  outliers  and  anomalies  in  the  parameters  of  a  model,  ARIMA,  to  be  specific,  by  way 
of  states  variables  in  the  Kalman  filter  when  the  latter  is  used  to  do  parameter  estimation. 

How  to  expand  the  application  range  of  the  RGLRT  is  the  next  step  that  I  am  pursuing.  Practically, 
the  task  of  simplifying  the  window  size  selection  is  still  worth  more  consideration. 


9.2  Resilient  Control 

So  far,  this  dissertation  works  has  shown  the  promise  of  resilient  estimation  and  the  potential 
of  resilient  control.  Much  theory  development  is  needed  in  the  niche  of  resilient  control  verse 
the  conventual  robust  control  and  minimax  approach.  With  smart  grids  and  the  new  intelligent 
infrastructure  on  the  horizon,  the  concept  of  resilient  control  has  profound  meaning  and  impact  on 
the  development  technicality  as  well. 
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9.3  Network  Intrusion  Detection 

Network  intrusion  detection  research  for  SCADA  systems  to  date  has  been  quite  limited,  with 
the  three  most  prominent  and  critical  deficiencies  being:  the  lack  of  a  well-considered  threat 
model;  the  absence  of  addressing  false  alarm  and  false  negative  (mis-detection)  rates;  and  the 
need  to  empirically  ground  the  development  of  IDS  mechanisms  in  the  realities  of  how  such  sys¬ 
tem  operate  in  practice,  including  the  diversity  of  traffic  they  manifest  and  the  need  to  tailor  IDS 
operation  to  different  SCADA  environments.  To  this  end,  I  focus  on  developing  flexible,  compre¬ 
hensive  SCADA-oriented  IDS  analysis;  I  do  not  endeavor  to  provide  rigorous,  all-encompassing 
SCADA  security. 

I  will  begin  with  considering  how  to  effectively  categorize  cyber  attacks  into  taxonomies  that 
illuminate  the  problem  space,  considering  three  distinct  dimensions:-  how  attacks  manifest  in 
appearance  as  seen  in  network  traffic  (defense  perspective);-  how  attacks  are  constructed  and 
the  accompanying  resources  required  to  realize  them  (attacker  and  prevention  perspective);-  the 
damage  implications  of  different  types  of  attacks  (victim  perspective).  I  next  aim  to  capture  the 
characteristics  of  a  specific  SCADA  system  under  study  (a  segment  of  the  power  grid)  with  full 
situational  awareness,  including  the  dynamics  of  the  physical  plant  being  monitored,  its  com¬ 
munication  patterns,  system  architecture,  network  traffic  behavior,  and  specific  application-level 
protocols  used,  ranging  from  the  dominate  Modbus/TCP  and  DNP3  to  newer  protocols  such  as 
WirelessHART  and  ISA100. 

After  study  of  this  SCADA  system,  I  will  develop  attack  trees  and  derive  from  it  prudent  threat 
models.  This  will  include  consideration  of  evasion  mechanisms  attackers  can  employ  in  light  of 
the  applications  in  use  (beyond  those  already  known  for  TCP/IP).  I  will  derive  application-level 
protocol  specifications  and  implementation  specifics  and  from  these  construct  analyzers  for  an 
open-source  IDS.  At  the  heart  of  this  effort  I  envision  development  of  ’’normalcy  checking,”  i.e., 
a  combination  of  techniques  designed  to  capture  two  envelopes  of  possible  system  activity:  (1) 
definitely  safe  operations  and  (2)  definitely  unsafe  operations.  When  identifiable,  the  first  of  these 
can  be  safely  ignored;  the  second  merits  immediate  attention/blocking;  and  the  middle  ground 
between  the  two  requires  additional  analysis.  The  first  technique  I  will  draw  upon  in  this  regard 
is  specification-based  intrusion  detection  that  constructs  the  control  system’s  overall  allowable 
behavior,  i.e.,  as  seen  from  the  application  level,  and  reflecting  the  monitored  plant  dynamics,  in¬ 
cluding  its  valid  extreme  cases.  The  second  uses  encodings  of  misuse  signatures  and  their  possible 
variants.  The  third  draws  upon  models  derived  from  the  control  system’s  formal  dynamics;  this 
aspect  is  unique  to  the  problem  domain  and  holds  great  promise  for  refining  the  scope  to  which  I 
will  apply  the  analysis.  I  will  draw  upon  traces  of  live  operation  to  develop  and  tune  this  system.  I 
will  incorporate  our  detection  mechanisms  into  NIDS  to  realize  an  operational  system,  validating 
its  efficacy  using,  first,  commercial  SCADA  emulation  software;  then  synthesized  traffic  created 
in  the  DETER  testbed;  then  on  new  traces  from  the  operational  environments;  followed  by  live 
’’shadow”  operation.  For  our  testbed,  we  will  construct  a  test  environment  consisting  of  physi¬ 
cal  PLCs  and  IEDs  to  emulate  the  SCADA  system  under  study,  where  we  inject  designed  attack 
traffic  along  with  traffic  synthesize  from  traces  separate  from  those  used  in  developing  and  tuning 
the  system  in  order  to  assess  false  positive  and  false  negative  rates.  The  final  proof,  necessarily, 
will  come  from  prototype  in  situ  deployment,  which  will  require  ongoing  interactions  with  the 
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SCADA  system’s  operational  staff. 
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